RE: [PATCH net-next v9] openvswitch: enable NSH support

2017-09-29 Thread Jan Scheurich
> > The optimization Yi refers to only affects the slow path translation.
> >
> > OVS 2.8 does not immediately trigger an immediate recirculation after 
> > translating
> > encap(nsh,...). There is no need to do so as the flow key of the resulting 
> > packet
> > can be determined from the encap() action and its properties. Translation
> > continues with the rewritten flow key and subsequent OpenFlow actions will
> > typically set the new fields in the new NSH header. The push_nsh datapath 
> > action
> > (including all NSH header fields) is only generated at the next commit, 
> > e.g. for
> > output, cloning, recirculation, encap/decap or another destructive change of
> > the flow key.
> >
> > The implementation of push_nsh in the user-space datapath does not update
> > the miniflow (key) of the packet, only the packet data and some metadata.
> > If the packet needs to be looked up again the slow path triggers 
> > recirculation
> > to re-parse the packet. There should be no need for the datapath push_nsh
> > action to try to update the flow key.
> 
> Thanks Jan for clarification, it can still work after removing that
> line, our flows didn't match it after push_nsh, it is output to
> VxLAN-gpe port after push_nsh, I'm not sure if we can match dl_type and NSH
> fields if we don't output and don't recirculate.

No worries, a packet cannot be matched again in the datapath unless it is 
recirculated. And recirculation today always implies re-parsing. 

In the future we want to look into possibilities to optimize performance of 
recirculation, for example by skipping the parsing stage if it is unnecessary.
For that we may need to invalidate the flow key in packet metadata when
the packet is modified without corresponding update of the key itself. But that
is music of the future.

/Jan


RE: [PATCH net-next v9] openvswitch: enable NSH support

2017-09-29 Thread Jan Scheurich
> From: Yang, Yi [mailto:yi.y.y...@intel.com]
> Sent: Friday, 29 September, 2017 08:41
> To: Pravin Shelar <pshe...@ovn.org>
> Cc: Jiri Benc <jb...@redhat.com>; netdev@vger.kernel.org; 
> d...@openvswitch.org; e...@erig.me; da...@davemloft.net; Jan Scheurich
> <jan.scheur...@ericsson.com>
> Subject: Re: [PATCH net-next v9] openvswitch: enable NSH support
> 
> On Fri, Sep 29, 2017 at 02:28:38AM +0800, Pravin Shelar wrote:
> > On Tue, Sep 26, 2017 at 6:39 PM, Yang, Yi <yi.y.y...@intel.com> wrote:
> > > On Tue, Sep 26, 2017 at 06:49:14PM +0800, Jiri Benc wrote:
> > >> On Tue, 26 Sep 2017 12:55:39 +0800, Yang, Yi wrote:
> > >> > After push_nsh, the packet won't be recirculated to flow pipeline, so
> > >> > key->eth.type must be set explicitly here, but for pop_nsh, the packet
> > >> > will be recirculated to flow pipeline, it will be reparsed, so
> > >> > key->eth.type will be set in packet parse function, we needn't handle 
> > >> > it
> > >> > in pop_nsh.
> > >>
> > >> This seems to be a very different approach than what we currently have.
> > >> Looking at the code, the requirement after "destructive" actions such
> > >> as pushing or popping headers is to recirculate.
> > >
> > > This is optimization proposed by Jan Scheurich, recurculating after 
> > > push_nsh
> > > will impact on performance, recurculating after pop_nsh is unavoidable, So
> > > also cc jan.scheur...@ericsson.com.
> > >
> > > Actucally all the keys before push_nsh are still there after push_nsh,
> > > push_nsh has updated all the nsh keys, so recirculating remains avoidable.
> > >
> >
> >
> > We should keep existing model for this patch. Later you can submit
> > optimization patch with specific use cases and performance
> > improvement. So that we can evaluate code complexity and benefits.
> 
> Ok, I'll remove the below line in push_nsh and send out v11, thanks.
> 
>   key->eth.type = htons(ETH_P_NSH);

The optimization Yi refers to only affects the slow path translation. 

OVS 2.8 does not immediately trigger an immediate recirculation after 
translating 
encap(nsh,...). There is no need to do so as the flow key of the resulting 
packet 
can be determined from the encap() action and its properties. Translation 
continues with the rewritten flow key and subsequent OpenFlow actions will 
typically set the new fields in the new NSH header. The push_nsh datapath 
action 
(including all NSH header fields) is only generated at the next commit, e.g. 
for 
output, cloning, recirculation, encap/decap or another destructive change of 
the flow key.

The implementation of push_nsh in the user-space datapath does not update
the miniflow (key) of the packet, only the packet data and some metadata. 
If the packet needs to be looked up again the slow path triggers recirculation
to re-parse the packet. There should be no need for the datapath push_nsh 
action to try to update the flow key.

BR, Jan


RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-06 Thread Jan Scheurich
> > There is no way we can re-use the existing TLV tunnel metadata
> > infrastructure in OVS for matching and setting NSH MD2 TLV headers. We
> > will need to introduce a new (perhaps similar) scheme for modelling
> > generic TLV match registers in OVS that are assigned to protocol TLVs
> > by the controller. This is FFS.
> 
> This is what I don't understand.
> 
> Why can't you just reuse the space in the struct sw_flow_key where
> geneve would put in their metadata. There are 255 empty bytes at the
> beginning if you don't have other tunnel metadata anyway.
> 
> If you receive packets over vxlan(gpe), tun_opts gets populated with an
> ip_tunnel_key. Couldn't you use the options space in there after the
> ip_tunnel_key to store the NSH context just for the sake of storing them
> somewhere instead of adding 16 bytes to sw_flow_key?

There is a significant conceptual difference between tunnel metadata (copied 
from a popped tunnel header) and packed match fields extracted during parsing 
of the packets. If we'd store them in the same space in the sw_flow_key struct, 
we are calling for trouble.

NSH is transport agnostic, it should work over Ethernet, VXLAN(GPE) and other 
transport tunnels. Think about an NSH packet arriving on an Geneve tunnel port. 
Any Geneve tunnel options have already been stored in the tun_opts metadata 
bytes. Now the datapath parses the NSH header and overwrites the tun_opts 
metadata with the NSH metadata. This would break the OVS semantics.

I absolutely understand your concern about efficient space utilization in the 
flow struct for TLV match fields and it will be part of the design challenge 
for MD2 TLV support to find a good balance between memory and run-time 
efficiency. But that is FFS. For the four fixed size MD1 headers the decision 
has been to include them as additional attributes in the flow key.

BR, Jan






RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-06 Thread Jan Scheurich
> >> Yes, I wrote that in my previous mail. I wonder why NSH context metadata
> >> is not in tun_metadata as well?
> >
> > tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is
> > not so, NSH can't directly use tun_metadata, for MD type 2, we need to a
> > lot of rework on tun_metadata to make it shared between GENEVE and NSH,
> > I don't think this can happen in near term. So tun_metadata isn't option
> > for this now.
> 
> Sorry, I couldn't follow you. Why can't you store the context headers in
> tun_metadata exactly?
> 

I think we mixing things. Let me try to clarify:

1. NSH context metadata has end-to-end significance for the SFP. They must be 
part of the NSH header and cannot be transported as tunnel metadata, because 
transport tunnels (e.g. Geneve) only connect pairs of SFFs in the path. So we 
need OVS to be able to match on and set NSH context header fields, also for MD2 
TLVs in the future. 

2. OVS today has support for matching on TLV tunnel metadata after termination 
of a Geneve tunnel. This infrastructure is only usable for OVS tunnel ports 
(like Geneve) but not for matching on TLV headers of the NSH protocol, which is 
not modelled as an OVS tunnel port but handled in the OpenFlow pipeline (with 
generic encp/decap actions to enter/terminate an NSH SFP). This was a strategic 
decision by the OVS community two years ago.

There is no way we can re-use the existing TLV tunnel metadata infrastructure 
in OVS for matching and setting NSH MD2 TLV headers. We will need to introduce 
a new (perhaps similar) scheme for modelling generic TLV match registers in OVS 
that are assigned to protocol TLVs by the controller. This is FFS.

BR, Jan



RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-05 Thread Jan Scheurich
Hi Hannes,

Please have a look at the Google doc that sketches the overall solution to 
support NSH in OVS. 
https://drive.google.com/open?id=1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8

In details it is slightly outdated but the NSH MD1 support (and all 
prerequisites like PTAP and Generic Encap/Decap) have been implemented in OVS 
2.8 (ofproto layer and the userspace datapath). 

The NSH use cases are discussed in the chapter "Support for NSH". OVS is 
primarily targeting the Classifier and SFF use cases. Obviously the classifier 
must be able to set the MD1 context headers. The final SFF must be able to 
inspect the context headers, typically to determine the L2 or L3 forwarding 
context (e.g. VRF) in which to forward the decapsulated packet on to its final 
destination.

This information has end-to-end significance for the SFP and is encoded by the 
classifier in the NSH context headers. It cannot be put into transport tunnel 
options of e.g. a Geneve tunnel connecting two SFFs along the SFP.

We are now trying to establish feature parity in the kernel datapath. If the 
kernel datapath chooses not to support the MD1 fields, OVS with kernel datapath 
will not be able to address the classifier and final SFF use cases.

BR, Jan

> -Original Message-
> From: Hannes Frederic Sowa [mailto:han...@stressinduktion.org]
> Sent: Tuesday, 05 September, 2017 12:30
> To: Yang, Yi <yi.y.y...@intel.com>
> Cc: netdev@vger.kernel.org; d...@openvswitch.org; jb...@redhat.com; 
> e...@erig.me; b...@ovn.org; Jan Scheurich
> <jan.scheur...@ericsson.com>
> Subject: Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support
> 
> "Yang, Yi" <yi.y.y...@intel.com> writes:
> 
> > I'm not sure what new action you expect to bring here, I think group
> > action is just for this, as you said it isn't only bound to NSH, you can
> > start a new thread to discuss this. I don't think it is in scope of NSH.
> 
> It is in scope of this discussion as you will provide a user space API
> that makes the NSH context fields accessible from user space in a
> certain way. If you commit to this, there is no way going back.
> 
> I haven't yet grasped the idea on how those fields will be used in OVS
> besides load balancing. Even for load balancing the tunnel itself
> (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough
> entropy to do per-flow load balancing. What else is needed?  Why a
> context header for that? You just need multiple action chains and pick
> one randomly.
> 
> The only protocol that I can compare that to is geneve with TLVs, but
> the TLVs are global and uniquie and a property of the networking
> forwarding backplane and not a property of the path inside a tenant. So
> I expect this actually to be the first case where I think that matters.
> 
> Why are context labels that special that they are not part of tun_ops?
> 
> Thanks,
> Hannes


RE: [PATCH net-next v7] openvswitch: enable NSH support

2017-09-05 Thread Jan Scheurich
> From: Jiri Benc [mailto:jb...@redhat.com]

> > So what is the correct layout for MASKED_SET action with nested fields?
> > 1. All nested values, followed by all nested masks, or
> > 2. For each nested field value followed by mask?
> >
> > I guess alternative 1, but just to be sure.
> 
> It's 2. Alternative 1 breaks netlink assumptions, is ugly to implement
> and probably impossible to be properly validated.

OK. For outsiders this was far from obvious :-)

So, for OVS_ACTION_ATTR_SET_MASKED any nested attribute, no matter on which 
nesting level, must contain value directly followed by mask.

If that is the principle of handling masks in Netlink APIs, than we must adhere 
to it.

BR, Jan


RE: [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jan Scheurich
> On Mon, 4 Sep 2017 14:07:45 +0000, Jan Scheurich wrote:
> > Then perhaps I misunderstood your comment. I thought you didn't like that 
> > the
> > SET_MASKED action wrapped OVS_KEY_ATTR_NSH which in itself was nested.
> > I was aiming to avoid this by lifting the two components of the NSH header
> > components to the top level:
> > OVS_NSH_ATTR_BASE_HEADER --> OVS_KEY_ATTR_NSH_BASE_HEADER
> > OVS_NSH_ATTR_MD1_CONTEXT --> OVS_KEY_ATTR_NSH_MD1_CONTEXT
> 
> No, this should be a nested attr.
> 
> I objected to the way value+mask combo is handled.

OK, sorry for the confusion. 

So what is the correct layout for MASKED_SET action with nested fields?
1. All nested values, followed by all nested masks, or
2. For each nested field value followed by mask?

I guess alternative 1, but just to be sure.

Jan


RE: [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jan Scheurich
> > So is what you are suggesting the following?
> >
> > For matching:
> > OVS_KEY_ATTR_NSH_BASE_HEADERmandatory
> > OVS_KEY_ATTR_NSH_MD1_CONTEXToptional, in case MDTYPE == MD1
> 
> This needs to be:
> 
> OVS_KEY_ATTR_NSH
>   OVS_KEY_ATTR_NSH_BASE_HEADER
>   OVS_KEY_ATTR_NSH_MD1_CONTEXT
> 
> > For setting:
> > OVS_ACTION_ATTR_SET_MASKED
> > -OVS_KEY_ATTR_NSH_BASE_HEADER   when setting any base header 
> > fields
> > OVS_ACTION_ATTR_SET_MASKED
> > -OVS_KEY_ATTR_NSH_MD1_CONTEXT   when setting any MD1 context data fields
> 
> This needs to be:
> 
> OVS_ACTION_ATTR_SET_MASKED
>   OVS_KEY_ATTR_NSH
>   OVS_KEY_ATTR_NSH_BASE_HEADER
>   OVS_KEY_ATTR_NSH_MD1_CONTEXT
> 

Then perhaps I misunderstood your comment. I thought you didn't like that the 
SET_MASKED action wrapped OVS_KEY_ATTR_NSH which in itself was nested.
I was aiming to avoid this by lifting the two components of the NSH header 
components to the top level:
OVS_NSH_ATTR_BASE_HEADER --> OVS_KEY_ATTR_NSH_BASE_HEADER
OVS_NSH_ATTR_MD1_CONTEXT --> OVS_KEY_ATTR_NSH_MD1_CONTEXT

> In your example, there's no way to distinguish OVS_KEY_ATTR_ENCAP and
> OVS_KEY_ATTR_NSH_BASE_HEADER, they are both "1".

See above. The two would be separate values in the same enum, i.e. distinct.

> > Should we consider to stick to that simple and efficient approach? As far
> > As I can see it will be generic for the foreseeable future.
> 
> I'm not sure. From the kernel point of view, we want the same
> functionality offered by the openvswitch module and by a tc action.
> 
> In theory, it can be the tc tool that constructs the header from the
> command line but there's no precedent. Dunno.

Not sure I have the full picture here. Are you saying that the tc tool uses the 
same
Netlink API to the kernel as OVS? What would be the back-end for tc? Also the
openvswitch kernel module? 

BR, Jan



RE: [PATCH net-next v7] openvswitch: enable NSH support

2017-09-04 Thread Jan Scheurich
> On Mon, 4 Sep 2017 16:00:05 +0800, Yang, Yi wrote:
> > I think we have had similiar discussion about this, the issue is we have
> > no way to handle both MD type 1 and MD type 2 by using a common flow key 
> > struct.
> >
> > So we have to do so, there is miniflow to handle such issue in
> > userspace, but kernel data path didn't provide such mechanism. I think
> > every newly-added key will result in the same space-wasting issue for
> > kernel data path, isn't it?
> 
> Yes. And it would be surprising if it didn't have an effect on
> performance.
> 
> I don't care that much as this is a result of openvswitch module design
> decisions but it still would be good to reach a consensus before
> implementing uAPI that might not be needed in the end.

Matching on NSH MD1 context headers is definitely required. So these must
be part of the flow.

> 
> > OVS_ATTR_KEY_NSH is the first case to use nested attributes for set
> > action, so no reference code for this, set action has two use cases, one
> > has mask but the other hasn't, so it will be easier an nested attribute is
> > handled as a whole, in user space, nsh_key_from_nlattr is used in
> > several places.
> 
> I very much disagree. What you're doing is very poor design as can be
> seen from the code where you have to do weird gymnastics to implement
> that. Compare that to a much simple case where each attribute carries
> its own value and mask.

I have no principle objections against splitting NSH base header and MD1 context
headers into independent key attributes on the uAPI if that simplifies the code.
But we need to full picture here:

So is what you are suggesting the following?

For matching:
OVS_KEY_ATTR_NSH_BASE_HEADERmandatory
OVS_KEY_ATTR_NSH_MD1_CONTEXToptional, in case MDTYPE == MD1

For setting:
OVS_ACTION_ATTR_SET_MASKED
-OVS_KEY_ATTR_NSH_BASE_HEADER   when setting any base header 
fields
OVS_ACTION_ATTR_SET_MASKED
-OVS_KEY_ATTR_NSH_MD1_CONTEXT   when setting any MD1 context data fields

For push_nsh:
OVS_ACTION_ATTR_PUSH_NSH
-OVS_KEY_ATTR_NSH_BASE_HEADER   mandatory
-OVS_NSH_ATTR_CONTEXT_HEADERS   optional (opaque metadata octet 
stream) 

So push_nsh would still require nested attributes. Is that what we want (see 
below).

> > How can we do so? I see batch process code in user space implementation,
> > but kernel data path didn't such infrastructure,
> 
> You're right that there's no infrastructure. I somewhat agree that it
> might be out of scope of this patch and it can be optimized later.
> That's why I also included other suggestions to speed this up until we
> implement better parsing: namely, verify correctness once (at the time
> it is set from user space) and just expect things to be correct in the
> fast path.
> 
> > how can we know next push_nsh uses the same nsh header as previous
> > one?
> 
> We store the prepopulated header together with the action.

I agree.

In our original modelling of OVS_ACTION_ATTR_PUSH_NSH in OVS 2.8 we 
introduced a monolithic push_nsh action struct including the NSH base header
and variable length opaque context headers. That avoids any logic in the 
kernel datapath to translate a nested action attribute into an internal format
with pre-populated header. The user space does that work.

Should we consider to stick to that simple and efficient approach? As far 
As I can see it will be generic for the foreseeable future.

> > > These checks should be done only once when the action is configured, not 
> > > for
> > > each packet.
> >
> > I don't know how to implement a batch processing in kernel data path, it
> > seems there isn't such code for reference.
> 
> The checks should be done somewhere in __ovs_nla_copy_action. Which
> seems to be done even in your patch by nsh_key_put_from_nlattr
> (I didn't get that far during the review last week. The names of
> those nsh_(key|hdr)_*_nlattr are so similar that it's impossible to
> tell what they do without looking at the call sites - something you
> should also consider to improve). That makes it even more wrong: you
> appear to check everything twice, first on configuration and then for
> every packet.

Agree. Validation should only happen at action configuration time.

BR, Jan


RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-04 Thread Jan Scheurich
> >> Does it makes sense to keep the context headers as part of the flow?
> >> What is the reasoning behind it? With mdtype 2 headers this might either
> >> not work very well or will increase sw_flow_key size causing slowdowns
> >> for all protocols.
> >
> > For userspace, miniflow can handle such issue, but kernel data path
> > didn't provide such mechanism, so I don't think we can think of a better
> > way to fix this.
> 
> I do think that both, kernel and dpdk dp, are fine if you don't match on
> context fields, which I think is the common case.

Agree

> As soon as a few paths start to match on contexts at different offsets I
> am not sure if miniflow will actually help. I don't know enough about
> that. Set cover is a NP hard problem, so constructing plain simple flows
> will be an approximation somewhere anyway at some point.
> 
> If the context values are in some way discrete it might make sense, but
> for load balancing purposes your ingress router might fill out one
> context field with a flow hash of the inner flow to allow ECMP or
> loadbalancing. You might want to treat that separately. Otherwise the
> different paths might always need to context[3] & 0x3 (bits as number of
> possible next hops) and put that into the flow state. Because every hop
> and every path might have different numbers of nexthops etc., I don't
> think this is easy unifiable anymore.

Yes, that would become a scaling challenge for the datapaths and also for 
schemes to off-load that datapath to HW. I believe it requires discipline
on the controller side to not let the number of needed masks explode for
OVS.

> > I believe every newly-added key will result in similiar issue you
> > concern, maybe it will be better to introduce miniflow in kernel data
> > path, but it isn't in scope of this patch series.
> 
> You will see the same problem with offloading flows e.g. to network
> cards very soon. Flow explosion here will cause your 100Gbit/s NIC
> suddenly to go to software slow path.
> 
> > It will be great if you can have a better proposal to fix your concern.
> 
> I do think that something alike miniflow might make sense, but I don't
> know enough about miniflow.

Struct miniflow in the netdev datapath is merely a compact representation
of struct flow. struct flow is partitioned in to 64-bit "stripes" and struct
miniflow only stores those 64-bit bit stripes that have a non-zero mask stripe.
It reduces the memory footprint for storing packet flow and megaflow entries
but does not provide any megaflow lookup performance as such.

> New ACTIONS, that are not only bound to NSH, seem appealing to me at the
> moment. E.g. have a subtable match in the kernel dp or allowing for some
> OXM % modulo -> select action or neighbor alike thing would probably
> cover a lot of cases in the beginning. The submatch would probably very
> much reassemble the miniflow in dpdk dp.

I would like to discuss your ideas to optimize scalable load-sharing in OVS.
But I think we should do that in a separate mail thread to let the immediate
NSH kernel datapath work proceed. 

Can you  provide some more details of what you suggest? Perhaps take that 
on the ovs-dev mailing list.

BR, Jan


RE: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-04 Thread Jan Scheurich
> >> >> Does it makes sense to keep the context headers as part of the flow?
> >> >> What is the reasoning behind it? With mdtype 2 headers this might
> >> >> either not work very well or will increase sw_flow_key size causing
> >> >> slowdowns for all protocols.
> >> > [Mooney, Sean K]
> >> > Having the nsh context headers in the flow is quite useful It would
> >> > allow loadblancing on values stored in the context headers Or other
> >> > use. I belive odl previously used context header 4 to store a Flow id
> >> > so this could potentialy be used with the multipath action to have
> >> ovs
> >> > Choose between several possible next hops in the chain.
> >>
> >> In OVS, masks are a list(!) for matching. How can this work for
> >> different paths that might require different masks? If they can't be
> >> unified you even get exact matches. Thus, for OVS the context should
> >> not be part of the flow.
> 
> > [Mooney, Sean K] I'm not sure what you mean about a list as I never
> > made reference to one.  md type 1 context headers are 4 mandatory 32
> > bit field.
> 
> The semantic of the context fields depend on the path id. Every path can
> have their own interpretation of context fields.
> 
> Thus the paths will cetainly have their own masks. I hope I understood
> this part of the standard correctly.
> 
> dp only supports installing (approximated) disjoint megaflows and this
> affects performance a lot. Every new mask that gets added to the dp will
> be installed into a list which will be walked sequentially for
> packets. This will kill performance.
> 
> I assume that user space slows down, too, depending on the algorithm
> they use generate megaflows.

The primary use case for OVS in SFC/NSH is SFF. In almost all SFF use 
cases OVS will not need to match on context headers. The ability of OVS
to match on context headers should not in general slow down the datapath.

When SFC controllers match on parts of the context headers (e.g. in the 
final SFF or for load-balancing purposes), we will get additional megaflow 
masks. This is a fact of life in OVS and nothing new in NSH. I don't think
this should prevent us from supporting valid use cases in OVS.

The slow-down of the megaflow lookup in the datapath with the number 
of masks has been addressed in the user-space datapath with sorting the
mask lists according to hit frequency and maintaining one sorted lists per 
ingress port. There is further work in progress to improve on this with a
cuckoo hash to pick the most likely mask first.
It should be possible to apply similar optimization schemes also in the
Kernel datapath.

> > form an ovs context they should be treated the same as the 32 bit register 
> > fields.
> > We do not need to necessarily support those in md type 2 as all metadata is 
> > optional.

Support for matching on or updating MD2 TLV context headers is FFS in a
future OVS release. We do not foresee to add MD2 TLVs explicitly to the 
flow struct.

> > You can also see that context header 1 and 2 are still used in the newer 
> > odl netvirt sfc classifier implementation
> > https://github.com/opendaylight/netvirt/blob/ba22f7cf19d8a827d77a3391a7f654344ade43d8/docs/specs/new-sfc-
> classifier.rst#pipeline-changes
> > so for odl existing nsh support to work with md type one we must have the 
> > ability to set the nsh context headers
> > and should have the ability to match on them also for load lancing between 
> > service function in service function group.
> 
> As I said, a separate action sounds much more useful.

I don't think it is a good approach in OVS to introduce custom actions for NSH, 
e.g. for load sharing based on context header data. The primary tool for load 
sharing in OpenFlow is the Select Group. OVS already support an extension to 
the OF 1.5 Group Mod message to specify which fields to hash on. This can be 
used to hash on the relevant NSH context headers.

BR, Jan


RE: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support

2017-09-01 Thread Jan Scheurich
> > [Mooney, Sean K]
> > Having the nsh context headers in the flow is quite useful It would
> > allow loadblancing on values stored in the context headers Or other
> > use. I belive odl previously used context header 4 to store a Flow id
> > so this could potentialy be used with the multipath action to have ovs
> > Choose between several possible next hops in the chain.
> 
> In OVS, masks are a list(!) for matching. How can this work for different
> paths that might require different masks? If they can't be unified you even
> get exact matches. Thus, for OVS the context should not be part of the
> flow.

The NSH support in OVS 2.8 (for the user-space datapath only, so far) supports 
matching on and manipulating the fixed size MD1 context headers C1-C4. They are 
part of the flow and there are corresponding OXM fields defined. It is up to 
the SDN controller to program pipelines that match on or set these fields. 

The goal was to support all relevant NSH use cases for MD1: Classifier, SFF, 
and (with certain limitations) NSH proxy, and SF.

We also support MD2 TLV context headers but not yet for matching and setting, 
so MD2 TLVs are not part of the flow. OVS 2.8 can add MD2 context TLVs with the 
encap(nsh) action (classifier use case), can transparently forward MD2 headers 
(SFF use case) and pop an NSH header with MD2 context (final SFF use case). 
Support for matching and setting MD2 TLVs is FFS and can be added in a later 
release.

BR, Jan







RE: [PATCH net-next v4] openvswitch: enable NSH support

2017-08-22 Thread Jan Scheurich
> > If I understand correctly, this is a default definition that can be
> > overridden by drivers so that we still cannot rely on the Ethernet
> > payload always being 32-bit-aligned.
> 
> Not by drivers, by architectures. Only architectures that can efficiently
> handle unaligned access (or on which the cost of handling unaligned access
> is lower than the cost of unaligned DMA access) set NET_IP_ALIGN to 0.

Thanks, got it!

> 
> > Furthermore, there seem to be machine architectures that cannot handle
> > misaligned 32-bit access at all (not even with a performance penalty).
> 
> Those leave NET_IP_ALIGN to 2.

Dito

> 
> > Or why else does OVS user space code take so great pain to model
> > possible misalignment and provide/use safe access functions?
> 
> I don't know how the ovs user space deals with packet allocation. In the
> kernel, the network header is aligned in a way that it allows efficient 32bit
> access.

It seems that OVS has not had the same approach as Linux. There is no config 
parameter covering the alignment characteristics of the machine architecture. 
For packets buffers received from outside sources (e.g. DPDK interfaces) they 
make no assumptions on alignment and play safe. For packets allocated inside 
OVS, the Ethernet packet is typically stored so that the L3 header is 32-bit 
aligned, so that the misalignment precautions would be unnecessary. But I 
didn't check all code paths.




RE: [PATCH net-next v4] openvswitch: enable NSH support

2017-08-21 Thread Jan Scheurich
> > NSH can be carried over Ethernet with a 14 byte header. In that case
> > the total NSH header would typically be 16-bit aligned, so that all
> > 32-bit members would be misaligned.
> 
> See NET_IP_ALIGN in include/linux/skbuff.h.

If I understand correctly, this is a default definition that can be overridden 
by drivers so that we still cannot rely on the Ethernet payload always being 
32-bit-aligned. 

Furthermore, there seem to be machine architectures that cannot handle 
misaligned 32-bit access at all (not even with a performance penalty).

Or why else does OVS user space code take so great pain to model possible 
misalignment and provide/use safe access functions?

Does Linux kernel code generally ignore this risk?

/Jan


RE: [PATCH net-next v4] openvswitch: enable NSH support

2017-08-21 Thread Jan Scheurich
> > struct nsh_hdr {
> > ovs_be16 ver_flags_ttl_len;
> > uint8_t mdtype;
> > uint8_t np;
> > ovs_16aligned_be32 path_hdr;
> > union {
> > struct nsh_md1_ctx md1;
> > struct nsh_md2_tlv md2[];
> 
> I'm not that sure about this. With each member of md2 having a different
> size, you can't use md2 as an array. However, if it was declared as an
> array, it might encourage such (wrong) usage.
> 
> In particular, nsh_hdr->md2[1] is always wrong.
> 
> It seems better to not declare md2 as an array.

I understand your concern. But not declaring md2 as array is wrong as well, as 
there might not be an MD2 TLV context header. An MD2 NSH header is perfectly 
valid without any TLV. So in any case the user of the struct needs to be aware 
of the NSH semantics.

> >
> > I wonder about the possible 16-bit alignment of the 32-bit fields,
> > though. How is that handled in the kernel?
> 
> get_unaligned_*
> 
> > Also struct nsh_md1_ctx
> > has 32-bit members, which might not be 32-bit aligned in the packet.
> 
> I don't see that happening, it seems the header before md1 is 8 bytes and
> sizeof(md1) is 32 bytes? And for md2, the standard mandates that the md2
> size is a multiply of 4 bytes, too.

NSH can be carried over Ethernet with a 14 byte header. In that case the total 
NSH header would typically be 16-bit aligned, so that all 32-bit members would 
be misaligned.




RE: [PATCH net-next v4] openvswitch: enable NSH support

2017-08-21 Thread Jan Scheurich
> 
> The second member of the union should be a variable length array [] of
> struct nsh_md2_tlv
> 
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t mdtype;
> uint8_t np;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2[];
> };
> };
> 
> That was the original design before Ben removed it due to missing support
> in Microsoft compiler.
> For the Kernel datapath we should go back to that.
> 
> I wonder about the possible 16-bit alignment of the 32-bit fields, though.
> How is that handled in the kernel? Also struct nsh_md1_ctx has 32-bit
> members, which might not be 32-bit aligned in the packet.
> 

One afterthought: 

I think it would be good to add another member to the union to represent 
unstructured context headers as used, e.g., in the context of push_nsh.

struct nsh_hdr {
ovs_be16 ver_flags_ttl_len;
uint8_t mdtype;
uint8_t np;
ovs_16aligned_be32 path_hdr;
union {
uint8_t ctx_headers[];
struct nsh_md1_ctx md1;
struct nsh_md2_tlv md2[];
};
};




RE: [PATCH net-next v4] openvswitch: enable NSH support

2017-08-21 Thread Jan Scheurich
> struct nsh_hdr {
> ovs_be16 ver_flags_ttl_len;
> uint8_t mdtype;
> uint8_t np;
> ovs_16aligned_be32 path_hdr;
> union {
> struct nsh_md1_ctx md1;
> struct nsh_md2_tlv md2;
> };
> };
> 

The second member of the union should be a variable length array [] of struct 
nsh_md2_tlv

struct nsh_hdr {
ovs_be16 ver_flags_ttl_len;
uint8_t mdtype;
uint8_t np;
ovs_16aligned_be32 path_hdr;
union {
struct nsh_md1_ctx md1;
struct nsh_md2_tlv md2[];
};
};

That was the original design before Ben removed it due to missing support in 
Microsoft compiler. 
For the Kernel datapath we should go back to that. 

I wonder about the possible 16-bit alignment of the 32-bit fields, though. How 
is that handled in the kernel? Also struct nsh_md1_ctx has 32-bit members, 
which might not be 32-bit aligned in the packet.

BR, Jan


RE: [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jan Scheurich
> From: Jiri Benc [mailto:jb...@redhat.com]
> Sent: Monday, 14 August, 2017 12:48
> 
> On Mon, 14 Aug 2017 10:35:42 +, Jan Scheurich wrote:
> > Is it worth to speculate on how a hypothetical future NSH version
> > (with a different Version value in the Base header) might look like?
> 
> Absolutely. This is uAPI we're talking about and once merged, it's set
> in stone. Whatever we come up with should allow future extensibility.
> 
> > If this should ever arise, we could introduce a new push_nsh_v2
> > action.
> 
> Which would mean we failed with the design. We would have to maintain
> two parallel APIs forever. This is not how the design should be made.

Well, if that is the ultimate design goal, we'll have no choice.

But if the hypothetic next NSH version looks entirely different, we will have 
to manage the incompatibility anyhow on the level of the nested attributes. So 
the only thing we will for sure manage to keep fixed might be the empty wrapper 
OVS_ACTION_ATTR_PUSH_NSH...

We decided earlier on to go for dedicated push/pop_ actions in the 
datapath instead of a single pair of (very polymorphic) generic encap/decap 
datapath actions to make the implementation in the datapath more explicit and 
straightforward. Why not follow the same philosophy also when we need to 
support push/pop for incompatible versions of the same protocol?

Jan



RE: [PATCH net-next v2] openvswitch: enable NSH support

2017-08-14 Thread Jan Scheurich
> From: Jiri Benc [mailto:jb...@redhat.com]
> Sent: Monday, 14 August, 2017 09:51
> 
> On Sun, 13 Aug 2017 21:13:57 +, Jan Scheurich wrote:
> > Jiri, I am not too familiar with conventions on the OVS netlink
> > interface regarding the handling of variable length fields. What is
> > the benefit of structuring the push_nsh action into
> >
> > OVS_ACTION_ATTR_PUSH_NSH
> > +-- OVS_ACTION_ATTR_NSH_BASE_HEADER
> > +-- OVS_ACTION_ATTR_NSH_MD
> >
> > instead of grouping the base header fields and the variable length MD
> > into a single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the
> > main concern a potential future need to add additional parameters to
> > the push_nsh datapath action? Are there examples for such structured
> > actions other than OVS_ACTION_ATTR_CT where the need is obvious
> > because it is very polymorphic?
> 
> This is about having the design clean. What would you do if you had two
> variable length fields? Would you still put them in a single structure
> and have a length field in the structure, too? That would be wrong, we
> have length in the attribute header. I doubt you would do that. Which
> indicates that putting variable length fields to an attribute with
> anything other is wrong.
> 
> Also, look at how ugly the code would be. You'd have to subtract the
> base header length from the attribute length to get the variable data
> length. That's not nice at all.
> 
> Think about the netlink in the way that by default, every field should
> have its own attribute. Structures are included only for performance
> reasons where certain fields are always passed in a message and having
> them in separate attributes would be impractical and waste of space.
> Going out of your way to include the context data in the structure thus
> doesn't make sense.
> 
> > BTW:  The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading
> because
> > in the NSH draft the term base header is used for the first 32-bit
> > word, whereas here it includes also the 32-bit Service Path header.
> 
> An excellent comment. This means that it's very well possible that
> future NSH versions may not include SP header or may have it of a
> different size. Maybe we should consider putting it into a separate
> attribute, too? Not sure it is needed, though, I don't think it's
> likely to happen.

I think the NSH RFC draft is pretty clear that the NSH header (Version 0) will 
always contain the Base Header, the Service Header and a (possibly empty) set 
of Context Headers. The length of the context headers is implied by the total 
NSH header Length in the Base Header. The internal structure of the context 
headers, implied by the "MD type" in the base header, is irrelevant for the 
push_nsh action as it is managed by the ofproto-dpif layer. The current NSH 
header format simply does not allow for a second variable length section in the 
header.

Is it worth to speculate on how a hypothetical future NSH version (with a 
different Version value in the Base header) might look like? If this should 
ever arise, we could introduce a new push_nsh_v2 action. 

In summary, I'm still not convinced that we gain any significant generality by 
splitting up the OVS_ACTION_ATTR_PUSH_NSH action into nested attributes. It 
will only slow down the execution of the action in the datapath (or requires 
some optimization in the datapath to merge the attributes into a single 
internal action struct).

Note: For the OVS_KEY_ATTR_NSH we should split up in to 
OVS_KEY_ATTR_NSH_FIXED_HEADER (covering Base and Service headers) and 
OVS_KEY_ATTR_NSH_MD1.

Jan


RE: [PATCH net-next v2] openvswitch: enable NSH support

2017-08-13 Thread Jan Scheurich
> From: Jiri Benc [mailto:jb...@redhat.com]
> Sent: Friday, 11 August, 2017 12:23
> 
> On Fri, 11 Aug 2017 10:09:36 +, Jan Scheurich wrote:
> > Unless someone can explain to me why the datapath should understand the
> > internal structure/format of metadata in push_nsh, I would strongly
> > vote to keep the metadata as variable length octet sequence in the
> > non-structured OVS_ACTION_ATTR_PUSH_NSH
> 
> Could be but it still needs to be in a different attribute and not in
> the ovs_action_push_nsh structure.
> 
> Separate attributes for MD1/MD2 has the advantage of easier validation:
> with a separate MD1 type attribute, the size check is easier. With an
> unstructured MD attribute, we'd need to look into the
> OVS_ACTION_ATTR_NSH_BASE_HEADER attribute for mdtype and then validate
> the unstructured MD attribute size manually. Not a big deal, though.
> I don't have strong opinion here.
> 
> But I do have strong opinion that MD needs to go into a separate
> attribute, whether there are separate attributes for MD1/2 or not.

Jiri, I am not too familiar with conventions on the OVS netlink interface 
regarding the handling of variable length fields. What is the benefit of 
structuring the push_nsh action into

OVS_ACTION_ATTR_PUSH_NSH
+-- OVS_ACTION_ATTR_NSH_BASE_HEADER
+-- OVS_ACTION_ATTR_NSH_MD

instead of grouping the base header fields and the variable length MD into a 
single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the main concern a 
potential future need to add additional parameters to the push_nsh datapath 
action? Are there examples for such structured actions other than 
OVS_ACTION_ATTR_CT where the need is obvious because it is very polymorphic?

BR, Jan

BTW:  The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading because in the NSH 
draft the term base header is used for the first 32-bit word, whereas here it 
includes also the 32-bit Service Path header.


RE: [PATCH net-next v2] openvswitch: enable NSH support

2017-08-11 Thread Jan Scheurich
> -Original Message-
> From: Jiri Benc [mailto:jb...@redhat.com]
> Sent: Friday, 11 August, 2017 11:45
> 
> The context field does not apply to MD type 2. It looks wrong for the
> context field to be included in netlink attribute for anything other
> than MD type 1. Perhaps it needs to be put into a separate attribute,
> too?
> 
> Note that I'm talking only about the uAPI. Internally, ovs can use
> struct ovs_key_nsh that is MD type 1 only, there's no problem changing
> that later. But for the user space interface, this needs to be clean.
> This can be solved for example this way:
> 
> In include/uapi/linux/openvswitch.h:
> 
> struct ovs_key_nsh_base {
>   __u8 flags;
>   __u8 mdtype;
>   __u8 np;
>   __u8 pad;
>   __be32 path_hdr;
> };
> 
> + one more netlink attribute carrying MD type 1 info. Will probably
> require to change OVS_KEY_ATTR_NSH to a nested attribute etc.
> 
> In net/openvswitch/flow.h (or perhaps a different header would be more
> appropriate?):
> 
> struct ovs_key_nsh {
>   struct ovs_key_nsh_base base;
>   __be32 context[4];
> };
> 
> Plus needed conversions between OVS_KEY_ATTR_NSH and struct ovs_key_nsh
> when interfacing between the kernel and user space.
> 
> That way, we can have MD type 1 support only for now while still being
> allowed to redesign things in whatever way later.
> 
>  Jiri

For the OVS_KEY_ATTR_NSH I agree to move the fixed size MD1 context headers 
from nsh_base to a separate struct and use nested netlink attributes.

For OVS_ACTION_ATTR_PUSH_NSH attribute any metadata included is opaque to the 
datapath and should be copied as is into the packet header. I doubt that there 
is any benefit to model this with nested attributes for MD1 or MD2. This just 
adds complexity on sender and receiver side and requires updates in case there 
should be other MD types added later on.

Unless someone can explain to me why the datapath should understand the 
internal structure/format of metadata in push_nsh, I would strongly vote to 
keep the metadata as variable length octet sequence in the non-structured 
OVS_ACTION_ATTR_PUSH_NSH

BR, Jan


RE: [ovs-dev] [PATCH net-next] openvswitch: add NSH support

2017-08-09 Thread Jan Scheurich
Hi all,

In OVS 2.8 we support only fixed size NSH MD1 context data for matching and in 
set/copy_field actions. OVS parses an MD2 NSH header but does not make any TLV 
headers available to OF. The plan is to add support for matching/manipulating 
NSH MD2 TLVs through a new infrastructure of generic TLV match fields that can 
be dynamically mapped to specific protocol TLVs, similar to the way this is 
done for GENEVE tunnel metadata TLVs today. But this is work for an upcoming 
OVS release.

However, in encap() and decap() NSH actions we do support MD2 format already. 
The encap_nsh datapath action is agnostic of the MD format. Any MD2 TLV 
metadata are provided as encap properties in the OF encap() operation. They are 
translated by the ofproto layer and forwarded as opaque byte sequence in the 
encap_nsh datapath action.

Conversely, the decap_nsh() action pops any TLV metadata using the metadata 
length in the NSH header.

Consequently the datapath action OVS_ACTION_ATTR_ENCAP_NSH is already declared 
variable length:

odp_action_len(uint16_t type)
{
switch ((enum ovs_action_attr) type) {
...
case OVS_ACTION_ATTR_ENCAP_NSH: return ATTR_LEN_VARIABLE;
case OVS_ACTION_ATTR_DECAP_NSH: return 0;
...
}

Unfortunately, that has only partially been reflected in the rest of the code. 
The action struct should have a variable length metadata[] member and the 
function odp_put_encap_nsh_action() should set the action nl_attr length 
dynamically.

I'll provide a patch to fix that shortly.

BTW: I have no objections to renaming these datapath actions to push_nsh and 
pop_nsh if that helps avoiding confusion with the existing encap attributes on 
the netlink interface. But we should do that quickly as it is user-visible and 
affects unit test cases.

BR, Jan


> -Original Message-
> From: ovs-dev-boun...@openvswitch.org 
> [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff
> Sent: Wednesday, 09 August, 2017 04:42
> To: Yang, Yi Y 
> Cc: d...@openvswitch.org; netdev@vger.kernel.org; Jiri Benc 
> ; da...@davemloft.net
> Subject: Re: [ovs-dev] [PATCH net-next] openvswitch: add NSH support
> 
> To be clear, the OVS implementation is a placeholder.  It will get
> replaced by whatever netdev implements, and that's OK.  I didn't focus
> on making it perfect because I knew that.  Instead, I just made sure it
> was good enough for an internal OVS implementation that doesn't fix any
> ABI or API.  OVS can even change the user-visible action names, as long
> as we do that soon (and encap/decap versus push/pop doesn't matter to
> me).
> 
> The considerations for netdev are different and more permanent.
> 
> On Wed, Aug 09, 2017 at 02:05:12AM +, Yang, Yi Y wrote:
> > Hi, Jiri
> >
> > Thank you for your comments.
> >
> > __be32 c[4] is the name Ben Pfaff suggested, the original name is c1, c2, 
> > c3, c4, they are context data, so c seems ok, too :-)
> >
> > OVS has merged it and has the same name, maybe the better way is adding 
> > comment /* Context data */ after it.
> >
> > For MD type 2, struct ovs_key_nsh is very difficult to cover it, so far we 
> > don't know how to support MD type 2 better, Geneve defined 64
> tun_metadata0-63 to handle this, those keys are parts of struct flow_tnl, the 
> highest possibility is to reuse those keys.
> >
> > So for future MD type 2, we will have two parts of keys, one is from struct 
> > ovs_key_nsh, another is from struct flow_tnl, this won't break
> the old uAPI.
> >
> > "#define OVS_ENCAP_NSH_MAX_MD_LEN 16" is changed per Ben's comment from 
> > 256, Ben thinks 256 is too big but we only support
> MD type 1 now. We still have ways to extend it, for example:
> >
> > struct ovs_action_encap_nsh * oaen = (struct ovs_action_encap_nsh *) malloc 
> > (sizeof(struct ovs_action_encap_nsh) + ANY_SIZE);
> >
> > nl_msg_put_unspec(actions, OVS_ACTION_ATTR_ENCAP_NSH,
> >   oaen, sizeof(struct ovs_action_encap_nsh) + 
> > ANY_SIZE);
> >
> > In addition, we also need to consider, OVS userspace code must be 
> > consistent with here, so keeping it intact will be better, we have way
> to support dynamically extension when we add MD type 2 support.
> >
> > About action name, unfortunately, userspace data plane has named them as 
> > encap_nsh & decap_nsh, Jan, what do you think about Jiri's
> suggestion?
> >
> > But from my understanding, encap_* & decap_* are better because they 
> > corresponding to generic encap & decap actions, in addition,
> encap semantics are different from push, encap just pushed an empty header 
> with default values, users must use set_field to set the
> content of the header.
> >
> > Again, OVS userspace code must be consistent with here, so keeping it 
> > intact will be better because OVS userspace code was there.
> >
> >
> > -Original Message-
> > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On 
> > Behalf Of Jiri Benc
> > Sent: Tuesday,