Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-14 Thread Michał Mirosław
On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote:
> On Sat,  3 Dec 2016 10:22:28 +0100 (CET)
> Michał Mirosław  wrote:
> 
> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > intact through linux networking stack.
> > 
> > Signed-off-by: Michał Mirosław 
> > ---
> > 
> > Dear NetDevs
> > 
> > I guess this needs to be split to the prep..convert[]..finish sequence,
> > but if you like it as is, then it's ready.
> > 
> > The biggest question is if the modified interface and vlan_present
> > is the way to go. This can be changed to use vlan_proto != 0 instead
> > of an extra flag bit.
> > 
> > As I can't test most of the driver changes, please look at them carefully.
> > OVS and bridge eyes are especially welcome.
> > 
> > Best Regards,
> > Michał Mirosław
> 
> Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)?
> 
> If so then you need to be more verbose in the commit log, and lots more
> work is needed. You need to rename fields and validate every place a
> driver is using DEI bit to make sure it really does the right thing
> on that hardware. It is not just a mechanical change.

There are not many mentions of CFI bit in the Linux tree. Places that
used it as VLAN_TAG_PRESENT are fixed with this patchset. Other uses are:

 - VLAN code: ignored
 - ebt_vlan: ignored
 - OVS: cleared because of netlink API assumptions
 - DSA: transferred to/from (E)DSA tag
 - drivers: gianfar: uses properly in filtering rules
 - drivers: cnic: false-positive (uses only VLAN ID, CFI bit marks the field 
'valid')
 - drivers: qedr: false-positive (like cnic)

So unless there is something hidden in the hardware, no driver does anything
special with the CFI bit.

After this patchset only OVS will need further modifications to be able to
support handling of DEI bit.

Best Regards,
Michał Mirosław


Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-13 Thread Alexei Starovoitov
On Tue, Dec 13, 2016 at 6:03 PM, Michał Mirosław
 wrote:
> On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote:
>> On Sat,  3 Dec 2016 10:22:28 +0100 (CET)
>> Michał Mirosław  wrote:
>> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
>> > intact through linux networking stack.
>> >
>> > Signed-off-by: Michał Mirosław 
>> > ---
>> >
>> > Dear NetDevs
>> >
>> > I guess this needs to be split to the prep..convert[]..finish sequence,
>> > but if you like it as is, then it's ready.
>> >
>> > The biggest question is if the modified interface and vlan_present
>> > is the way to go. This can be changed to use vlan_proto != 0 instead
>> > of an extra flag bit.
>> >
>> > As I can't test most of the driver changes, please look at them carefully.
>> > OVS and bridge eyes are especially welcome.
>> >
>> > Best Regards,
>> > Michał Mirosław
>> Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)?
>>
>> If so then you need to be more verbose in the commit log, and lots more
>> work is needed. You need to rename fields and validate every place a
>> driver is using DEI bit to make sure it really does the right thing
>> on that hardware. It is not just a mechanical change.
>
> My main motivation is to be able to see the bit intact in tcpdump and be
> able to pass it untouched through at least a veth pair. It would be great
> if all devices didn't do something stupid with the bit, but it's not
> something I am able to make happen.

imo "be able to pass untouched through veth" is not good enough
justification for such invasive patches.
I'm still not sure that all of these changes don't affect user space.


Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-13 Thread Michał Mirosław
On Tue, Dec 13, 2016 at 05:21:18PM -0800, Stephen Hemminger wrote:
> On Sat,  3 Dec 2016 10:22:28 +0100 (CET)
> Michał Mirosław  wrote:
> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > intact through linux networking stack.
> > 
> > Signed-off-by: Michał Mirosław 
> > ---
> > 
> > Dear NetDevs
> > 
> > I guess this needs to be split to the prep..convert[]..finish sequence,
> > but if you like it as is, then it's ready.
> > 
> > The biggest question is if the modified interface and vlan_present
> > is the way to go. This can be changed to use vlan_proto != 0 instead
> > of an extra flag bit.
> > 
> > As I can't test most of the driver changes, please look at them carefully.
> > OVS and bridge eyes are especially welcome.
> > 
> > Best Regards,
> > Michał Mirosław
> Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)?
> 
> If so then you need to be more verbose in the commit log, and lots more
> work is needed. You need to rename fields and validate every place a
> driver is using DEI bit to make sure it really does the right thing
> on that hardware. It is not just a mechanical change.

My main motivation is to be able to see the bit intact in tcpdump and be
able to pass it untouched through at least a veth pair. It would be great
if all devices didn't do something stupid with the bit, but it's not
something I am able to make happen.

Best Regards,
Michał Mirosław


Re: [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-13 Thread Stephen Hemminger
On Sat,  3 Dec 2016 10:22:28 +0100 (CET)
Michał Mirosław  wrote:

> This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> intact through linux networking stack.
> 
> Signed-off-by: Michał Mirosław 
> ---
> 
> Dear NetDevs
> 
> I guess this needs to be split to the prep..convert[]..finish sequence,
> but if you like it as is, then it's ready.
> 
> The biggest question is if the modified interface and vlan_present
> is the way to go. This can be changed to use vlan_proto != 0 instead
> of an extra flag bit.
> 
> As I can't test most of the driver changes, please look at them carefully.
> OVS and bridge eyes are especially welcome.
> 
> Best Regards,
> Michał Mirosław

Is the motivation to support 802.1ad Drop Eligability Indicator (DEI)?

If so then you need to be more verbose in the commit log, and lots more
work is needed. You need to rename fields and validate every place a
driver is using DEI bit to make sure it really does the right thing
on that hardware. It is not just a mechanical change.


Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-05 Thread Ben Pfaff
On Mon, Dec 05, 2016 at 11:52:47PM +0100, Michał Mirosław wrote:
> On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote:
> > On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> > > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be 
> > > > > passed
> > > > > intact through linux networking stack.
> > > > This appears to change the established Open vSwitch userspace API.  You
> > > > can see that simply from the way that it changes the documentation for
> > > > the userspace API.  If I'm right about that, then this change will break
> > > > all userspace programs that use the Open vSwitch kernel module,
> > > > including Open vSwitch itself.
> > > 
> > > If I understood the code correctly, it does change expected meaning for
> > > the (unlikely?) case of header truncated just before the VLAN TCI - it 
> > > will
> > > be impossible to differentiate this case from the VLAN TCI == 0.
> > > 
> > > I guess this is a problem with OVS API, because it doesn't directly show
> > > the "missing" state of elements, but relies on an "invalid" value.
> > 
> > That particular corner case should not be a huge problem in any case.
> > 
> > The real problem is that this appears to break the common case use of
> > VLANs in Open vSwitch.  After this patch, parse_vlan() in
> > net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
> > accelerated version of them or the version in the skb data) into
> > sw_flow_key members.  OK, that's fine on it's own.  However, I don't see
> > any corresponding change to the code in flow_netlink.c to compensate for
> > the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
> > was always required to be set to 1 in flow matches inside Netlink
> > messages sent from userspace, and the kernel always set it to 1 in
> > corresponding messages sent to userspace.
> > 
> > In other words, if I'm reading this change correctly:
> > 
> > * With a kernel before this change, userspace always had to set
> >   VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
> >   reject the flow match.
> > 
> > * With a kernel after this change, userspace must not set
> >   VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
> >   match but nothing will ever match because packets do not actually
> >   have the CFI bit set.
> > 
> > Take a look at this code that the patch deletes from
> > validate_vlan_from_nlattrs(), for example, and see how it insisted that
> > VLAN_TAG_PRESENT was set:
> > 
> > if (!(tci & htons(VLAN_TAG_PRESENT))) {
> > if (tci) {
> > OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT 
> > bit set.",
> >   (inner) ? "C-VLAN" : "VLAN");
> > return -EINVAL;
> > } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> > /* Corner case for truncated VLAN header. */
> > OVS_NLERR(log, "Truncated %s header has non-zero encap 
> > attribute.",
> >   (inner) ? "C-VLAN" : "VLAN");
> > return -EINVAL;
> > }
> > }
> > 
> > Please let me know if I'm overlooking something.
> 
> Hmm. So the easiest change without disrupting current userspace, would be
> to flip the CFI bit on the way to/from OVS userspace. Does this seem
> correct?

That sounds correct.  (The bit should not be flipped in the mask.)

Thanks,

Ben.


Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-05 Thread Michał Mirosław
On Mon, Dec 05, 2016 at 10:55:45AM -0800, Ben Pfaff wrote:
> On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> > On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > > > intact through linux networking stack.
> > > This appears to change the established Open vSwitch userspace API.  You
> > > can see that simply from the way that it changes the documentation for
> > > the userspace API.  If I'm right about that, then this change will break
> > > all userspace programs that use the Open vSwitch kernel module,
> > > including Open vSwitch itself.
> > 
> > If I understood the code correctly, it does change expected meaning for
> > the (unlikely?) case of header truncated just before the VLAN TCI - it will
> > be impossible to differentiate this case from the VLAN TCI == 0.
> > 
> > I guess this is a problem with OVS API, because it doesn't directly show
> > the "missing" state of elements, but relies on an "invalid" value.
> 
> That particular corner case should not be a huge problem in any case.
> 
> The real problem is that this appears to break the common case use of
> VLANs in Open vSwitch.  After this patch, parse_vlan() in
> net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
> accelerated version of them or the version in the skb data) into
> sw_flow_key members.  OK, that's fine on it's own.  However, I don't see
> any corresponding change to the code in flow_netlink.c to compensate for
> the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
> was always required to be set to 1 in flow matches inside Netlink
> messages sent from userspace, and the kernel always set it to 1 in
> corresponding messages sent to userspace.
> 
> In other words, if I'm reading this change correctly:
> 
> * With a kernel before this change, userspace always had to set
>   VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
>   reject the flow match.
> 
> * With a kernel after this change, userspace must not set
>   VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
>   match but nothing will ever match because packets do not actually
>   have the CFI bit set.
> 
> Take a look at this code that the patch deletes from
> validate_vlan_from_nlattrs(), for example, and see how it insisted that
> VLAN_TAG_PRESENT was set:
> 
>   if (!(tci & htons(VLAN_TAG_PRESENT))) {
>   if (tci) {
>   OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT 
> bit set.",
> (inner) ? "C-VLAN" : "VLAN");
>   return -EINVAL;
>   } else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
>   /* Corner case for truncated VLAN header. */
>   OVS_NLERR(log, "Truncated %s header has non-zero encap 
> attribute.",
> (inner) ? "C-VLAN" : "VLAN");
>   return -EINVAL;
>   }
>   }
> 
> Please let me know if I'm overlooking something.

Hmm. So the easiest change without disrupting current userspace, would be
to flip the CFI bit on the way to/from OVS userspace. Does this seem
correct?

Best Regards,
Michał Mirosław


Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-05 Thread Ben Pfaff
On Mon, Dec 05, 2016 at 06:24:36PM +0100, Michał Mirosław wrote:
> On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> > On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > > intact through linux networking stack.
> > This appears to change the established Open vSwitch userspace API.  You
> > can see that simply from the way that it changes the documentation for
> > the userspace API.  If I'm right about that, then this change will break
> > all userspace programs that use the Open vSwitch kernel module,
> > including Open vSwitch itself.
> 
> If I understood the code correctly, it does change expected meaning for
> the (unlikely?) case of header truncated just before the VLAN TCI - it will
> be impossible to differentiate this case from the VLAN TCI == 0.
> 
> I guess this is a problem with OVS API, because it doesn't directly show
> the "missing" state of elements, but relies on an "invalid" value.

That particular corner case should not be a huge problem in any case.

The real problem is that this appears to break the common case use of
VLANs in Open vSwitch.  After this patch, parse_vlan() in
net/openvswitch/flow.c copies the tpid and tci from sk_buff (either the
accelerated version of them or the version in the skb data) into
sw_flow_key members.  OK, that's fine on it's own.  However, I don't see
any corresponding change to the code in flow_netlink.c to compensate for
the fact that, until now, the VLAN CFI bit (formerly VLAN_TAG_PRESENT)
was always required to be set to 1 in flow matches inside Netlink
messages sent from userspace, and the kernel always set it to 1 in
corresponding messages sent to userspace.

In other words, if I'm reading this change correctly:

* With a kernel before this change, userspace always had to set
  VLAN_TAG_PRESENT to 1 to match on a VLAN, or the kernel would
  reject the flow match.

* With a kernel after this change, userspace must not set
  VLAN_TAG_PRESENT to 1, otherwise the kernel will accept the flow
  match but nothing will ever match because packets do not actually
  have the CFI bit set.

Take a look at this code that the patch deletes from
validate_vlan_from_nlattrs(), for example, and see how it insisted that
VLAN_TAG_PRESENT was set:

if (!(tci & htons(VLAN_TAG_PRESENT))) {
if (tci) {
OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT 
bit set.",
  (inner) ? "C-VLAN" : "VLAN");
return -EINVAL;
} else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
/* Corner case for truncated VLAN header. */
OVS_NLERR(log, "Truncated %s header has non-zero encap 
attribute.",
  (inner) ? "C-VLAN" : "VLAN");
return -EINVAL;
}
}

Please let me know if I'm overlooking something.

Thanks,

Ben.


Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-05 Thread Michał Mirosław
On Sat, Dec 03, 2016 at 03:27:30PM -0800, Ben Pfaff wrote:
> On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> > This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> > intact through linux networking stack.
> > 
> > Signed-off-by: Michał Mirosław 
> > ---
> > 
> > Dear NetDevs
> > 
> > I guess this needs to be split to the prep..convert[]..finish sequence,
> > but if you like it as is, then it's ready.
> > 
> > The biggest question is if the modified interface and vlan_present
> > is the way to go. This can be changed to use vlan_proto != 0 instead
> > of an extra flag bit.
> > 
> > As I can't test most of the driver changes, please look at them carefully.
> > OVS and bridge eyes are especially welcome.
> 
> This appears to change the established Open vSwitch userspace API.  You
> can see that simply from the way that it changes the documentation for
> the userspace API.  If I'm right about that, then this change will break
> all userspace programs that use the Open vSwitch kernel module,
> including Open vSwitch itself.

If I understood the code correctly, it does change expected meaning for
the (unlikely?) case of header truncated just before the VLAN TCI - it will
be impossible to differentiate this case from the VLAN TCI == 0.

I guess this is a problem with OVS API, because it doesn't directly show
the "missing" state of elements, but relies on an "invalid" value.

I can probably change the code to mask this change in the OVS code (by keeping
the CFI/DEI bit unusable). This somehow feels wrong, though.

Best Regards,
Michał Mirosław


Re: [ovs-dev] [PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-03 Thread Ben Pfaff
On Sat, Dec 03, 2016 at 10:22:28AM +0100, Michał Mirosław wrote:
> This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
> intact through linux networking stack.
> 
> Signed-off-by: Michał Mirosław 
> ---
> 
> Dear NetDevs
> 
> I guess this needs to be split to the prep..convert[]..finish sequence,
> but if you like it as is, then it's ready.
> 
> The biggest question is if the modified interface and vlan_present
> is the way to go. This can be changed to use vlan_proto != 0 instead
> of an extra flag bit.
> 
> As I can't test most of the driver changes, please look at them carefully.
> OVS and bridge eyes are especially welcome.

This appears to change the established Open vSwitch userspace API.  You
can see that simply from the way that it changes the documentation for
the userspace API.  If I'm right about that, then this change will break
all userspace programs that use the Open vSwitch kernel module,
including Open vSwitch itself.


[PATCH net-next] net: remove abuse of VLAN DEI/CFI bit

2016-12-03 Thread Michał Mirosław
This All-in-one patch removes abuse of VLAN CFI bit, so it can be passed
intact through linux networking stack.

Signed-off-by: Michał Mirosław 
---

Dear NetDevs

I guess this needs to be split to the prep..convert[]..finish sequence,
but if you like it as is, then it's ready.

The biggest question is if the modified interface and vlan_present
is the way to go. This can be changed to use vlan_proto != 0 instead
of an extra flag bit.

As I can't test most of the driver changes, please look at them carefully.
OVS and bridge eyes are especially welcome.

Best Regards,
Michał Mirosław
---
 Documentation/networking/openvswitch.txt | 14 --
 arch/arm/net/bpf_jit_32.c| 14 +++---
 arch/mips/net/bpf_jit.c  | 17 +++
 arch/powerpc/net/bpf_jit_comp.c  | 14 +++---
 arch/sparc/net/bpf_jit_comp.c| 14 +++---
 drivers/infiniband/hw/cxgb4/cm.c |  2 +-
 drivers/infiniband/hw/i40iw/i40iw_cm.c   |  8 ++--
 drivers/net/ethernet/broadcom/cnic.c |  2 +-
 drivers/net/ethernet/emulex/benet/be_main.c  |  4 +-
 drivers/net/ethernet/freescale/gianfar_ethtool.c |  7 +--
 drivers/net/ethernet/ibm/ibmvnic.c   |  5 +-
 drivers/net/ethernet/marvell/sky2.c  |  4 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c   |  9 ++--
 drivers/net/hyperv/hyperv_net.h  |  2 +-
 drivers/net/hyperv/netvsc_drv.c  | 14 +++---
 drivers/net/hyperv/rndis_filter.c|  5 +-
 include/linux/if_vlan.h  | 23 ++---
 include/linux/skbuff.h   | 10 +++-
 lib/test_bpf.c   | 14 +++---
 net/8021q/vlan_core.c|  2 +-
 net/bridge/br_netfilter_hooks.c  | 14 +++---
 net/bridge/br_private.h  |  2 +-
 net/bridge/br_vlan.c |  6 +--
 net/core/dev.c   |  8 ++--
 net/core/filter.c| 17 +++
 net/core/skbuff.c|  2 +-
 net/ipv4/ip_tunnel_core.c|  2 +-
 net/netfilter/nfnetlink_queue.c  |  5 +-
 net/openvswitch/actions.c| 13 ++---
 net/openvswitch/flow.c   |  2 +-
 net/openvswitch/flow.h   |  4 +-
 net/openvswitch/flow_netlink.c   | 61 
 net/sched/act_vlan.c |  2 +-
 33 files changed, 152 insertions(+), 170 deletions(-)

diff --git a/Documentation/networking/openvswitch.txt 
b/Documentation/networking/openvswitch.txt
index b3b9ac6..e7ca27d 100644
--- a/Documentation/networking/openvswitch.txt
+++ b/Documentation/networking/openvswitch.txt
@@ -219,20 +219,6 @@ this:
 
 eth(...), eth_type(0x0800), ip(proto=6, ...), tcp(src=0, dst=0)
 
-As another example, consider a packet with an Ethernet type of 0x8100,
-indicating that a VLAN TCI should follow, but which is truncated just
-after the Ethernet type.  The flow key for this packet would include
-an all-zero-bits vlan and an empty encap attribute, like this:
-
-eth(...), eth_type(0x8100), vlan(0), encap()
-
-Unlike a TCP packet with source and destination ports 0, an
-all-zero-bits VLAN TCI is not that rare, so the CFI bit (aka
-VLAN_TAG_PRESENT inside the kernel) is ordinarily set in a vlan
-attribute expressly to allow this situation to be distinguished.
-Thus, the flow key in this second example unambiguously indicates a
-missing or malformed VLAN TCI.
-
 Other rules
 ---
 
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 93d0b6d..aff9dfa 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -915,17 +915,17 @@ static int build_body(struct jit_ctx *ctx)
emit(ARM_LDR_I(r_A, r_skb, off), ctx);
break;
case BPF_ANC | SKF_AD_VLAN_TAG:
-   case BPF_ANC | SKF_AD_VLAN_TAG_PRESENT:
ctx->seen |= SEEN_SKB;
BUILD_BUG_ON(FIELD_SIZEOF(struct sk_buff, vlan_tci) != 
2);
off = offsetof(struct sk_buff, vlan_tci);
emit(ARM_LDRH_I(r_A, r_skb, off), ctx);
-   if (code == (BPF_ANC | SKF_AD_VLAN_TAG))
-   OP_IMM3(ARM_AND, r_A, r_A, ~VLAN_TAG_PRESENT, 
ctx);
-   else {
-   OP_IMM3(ARM_LSR, r_A, r_A, 12, ctx);
-   OP_IMM3(ARM_AND, r_A, r_A, 0x1, ctx);
-   }
+   break;
+   case BPF_ANC | SKF_AD_VLAN_TAG_PRESENT:
+   off = PKT_VLAN_PRESENT_OFFSET();
+   emit(ARM_LDRB_I(r_A, r_skb, off), ctx);
+   if