Re: [ovs-dev] [patch_v7 0/9] Userspace Datapath: Introduce NAT support.

2017-04-29 Thread Daniele Di Proietto
Hi Darrell,

I took another look at the series and provided a few comments inline.

Other than those the patches look good to me, but I haven't looked at
every possible corner case :-)

Thanks,

Daniele

2017-03-24 2:15 GMT-07:00 Darrell Ball :
> This patch series introduces NAT support for the userspace datapath.
>
> The per packet scope of lookups for NAT and un_NAT is at
> the bucket level rather than global. One hash table is
> introduced to support create/delete handling. The create/delete
> events may be further optimized, if the need becomes clear.
>
> The existing NAT tests are enabled for the dpdk datapath,
> with some added enhancements.
>
> Some NAT options with limited utility (persistent, random) are
> not supported yet, but will be supported in a later patch.
>
> alg and fragmentation support are not included here but are
> being worked on.
>
> I realize patch 3 is big. It may be clearer and easier to keep
> as a single patch, so I have done that after some discussion.
>
> v6->v7: A couple patches were committed.
>
> Three news patches were added, including new
> orig tuple context recovery.
>
> Cleanup residual batch sorting deprecated array.
>
> Fix ICMP potential issue.
>
> Add more complete ICMP related handling, originally
> scoped for another patch series.
>
> Splice out a few functions for easier maintenance.
> There will be slight performance hit due to less
> narrow context.
>
> Use structure assignment vs memcpy in a few places.
> Remove unneeded memset.
> Rate limit a vlog.
> Remove a couple hand-rolled netlink parsings.
>
> v5->v6: Add releases file NAT alert, as pointed out by Flavio.
> Add some missing details in commit message in a couple
> patches as mentioned by Flavio.
> Flushed the bug queue - found a couple bugs in testing
> over the last week.
> a) nat_range_hash was missing the intended conn entry
>address and port fields :-); I guess missed since the
>corresponding nat info address and port fields were
>there.
> b) The netlink parsing math was off for min/max address
>in NAT range.
>
> v4->v5: Remove packet sorting in userspace datapath conntrack.
> Simplify conntrack state code.
> Fix sparse error.
> Address code review comments from Daniele.
>
> v3->v4: Fix rev_key vs key for nat_conn_keys access in a couple
> places; this would have affected cleanup; at same time
> rename some variables and change nat_conn_keys APIs to
> use conn key, rather than conn.
>
> Fix conntrack_flush() CT_CONN_TYPE_DEFAULT flag placement;
> the intention was that it be the same as in sweep_bucket().
>
> Fix nat_ipv6_addrs_delta() max boundary checking logic. I
> also enhanced the conntrack - IPv6 HTTP with NAT test to
> give it more coverage as partial penance.
>
> Rebase
>
> v2->v3: Fix a theoretical resend for closed connection restart.
> Parse out a function to help and also limit
> conn_state_update() to one.
>
> I decided to cap V6 address range delta at 4 billion using
> internal adjustment (user visibility not required).
>
> Some cleanup of deprecated code path.
>
> Parse out some more changes as separate patches.
>
> v1->v2: Updates/fixes that were missed in v1 patches.
>
> Darrell Ball (9):
>   dpdk: Parse NAT netlink for userspace datapath.
>   dpdk: Remove batch sorting in userspace conntrack.
>   dpdk: Userspace Datapath: Introduce NAT Support.
>   dpdk: Add more ICMP Related NAT support.
>   dpdk: Add orig tuple context recovery.
>   System Tests: Enhance NAT tests.
>   Add some system test fixes
>   dpdk: Enable NAT tests for userspace datapath.
>   dpdk: Update feature alert documentation.
>
>  Documentation/faq/releases.rst   |2 +-
>  NEWS |2 +
>  lib/conntrack-private.h  |   25 +-
>  lib/conntrack.c  | 1033 
> +-
>  lib/conntrack.h  |   76 ++-
>  lib/dpif-netdev.c|   85 +++-
>  lib/packets.h|7 +
>  tests/atlocal.in |3 +
>  tests/system-traffic.at  |  113 -
>  tests/system-userspace-macros.at |7 +-
>  tests/test-conntrack.c   |9 +-
>  11 files changed, 1198 insertions(+), 164 deletions(-)
>
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v7 6/9] System Tests: Enhance NAT tests.

2017-04-29 Thread Daniele Di Proietto
Not sure this is very important: so far we managed to avoid using tcpdump in
the tests.  Would it be possible to use ovs-ofctl monitor?

In any case, maybe, it shouldn't be prefixed by sudo

2017-03-24 2:15 GMT-07:00 Darrell Ball :
> Two new tests are added and two other tests were
> enhanced.
>
> Signed-off-by: Darrell Ball 
> ---
>  tests/atlocal.in|   3 ++
>  tests/system-traffic.at | 109 
> +++-
>  2 files changed, 110 insertions(+), 2 deletions(-)
>
> diff --git a/tests/atlocal.in b/tests/atlocal.in
> index bc2480b..67ebf0d 100644
> --- a/tests/atlocal.in
> +++ b/tests/atlocal.in
> @@ -152,6 +152,9 @@ else
>  NC_EOF_OPT="-q 1"
>  fi
>
> +# Set HAVE_TCPDUMP
> +find_command tcpdump
> +
>  CURL_OPT="-g -v --max-time 1 --retry 2 --retry-delay 1 --connect-timeout 1"
>
>  # Turn off proxies.
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index 9861fb1..59eae7e 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -2668,6 +2668,7 @@ AT_CLEANUP
>
>  AT_SETUP([conntrack - ICMP related with NAT])
>  AT_SKIP_IF([test $HAVE_NC = no])
> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>  CHECK_CONNTRACK()
>  CHECK_CONNTRACK_NAT()
>  OVS_TRAFFIC_VSWITCHD_START()
> @@ -2703,6 +2704,10 @@ table=10 priority=0 action=drop
>
>  AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
>
> +rm p0.pcap
> +tcpdump -U -i ovs-p0 -w p0.pcap &
> +sleep 1
> +
>  dnl UDP packets from ns0->ns1 should solicit "destination unreachable" 
> response.
>  NS_CHECK_EXEC([at_ns0], [bash -c "echo a | nc $NC_EOF_OPT -u 10.1.1.2 
> 1"])
>
> @@ -2724,6 +2729,8 @@ AT_CHECK([ovs-appctl dpctl/dump-conntrack | 
> FORMAT_CT(10.1.1.2) | sed -e 's/dst=
>  
> udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.2XX,sport=,dport=),mark=1
>  ])
>
> +AT_CHECK([sudo tcpdump -v "icmp" -r p0.pcap 2>/dev/null | egrep 
> 'wrong|bad'], [1], [ignore-nolog])
> +
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> @@ -3028,7 +3035,7 @@ dnl Check that ct(nat,table=foo) works with TCP 
> sequence adjustment with
>  dnl an ACL table based on matching on conntrack original direction tuple 
> only.
>  CHECK_FTP_NAT_ORIG_TUPLE([seqadj], [10.1.1.240], [0x0a0101f0])
>
> -AT_SETUP([conntrack - IPv6 HTTP with NAT])
> +AT_SETUP([conntrack - IPv6 HTTP with SNAT])
>  CHECK_CONNTRACK()
>  CHECK_CONNTRACK_NAT()
>  OVS_TRAFFIC_VSWITCHD_START()
> @@ -3039,15 +3046,17 @@ ADD_VETH(p0, at_ns0, br0, "fc00::1/96")
>  NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address 80:88:88:88:88:88])
>  ADD_VETH(p1, at_ns1, br0, "fc00::2/96")
>  NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::240 lladdr 80:88:88:88:88:88 
> dev p1])
> +NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::241 lladdr 80:88:88:88:88:88 
> dev p1])
>
>  dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
> ns1->ns0.
>  AT_DATA([flows.txt], [dnl
>  priority=1,action=drop
>  priority=10,icmp6,action=normal
> -priority=100,in_port=1,ip6,action=ct(commit,nat(src=fc00::240)),2
> +priority=100,in_port=1,ip6,action=ct(commit,nat(src=fc00::240-fc00::241)),2
>  priority=100,in_port=2,ct_state=-trk,ip6,action=ct(nat,table=0)
>  priority=100,in_port=2,ct_state=+trk+est,ip6,action=1
>  
> priority=200,in_port=2,ct_state=+trk+new,icmp6,icmpv6_code=0,icmpv6_type=135,nd_target=fc00::240,action=ct(commit,nat(dst=fc00::1)),1
> +priority=200,in_port=2,ct_state=+trk+new,icmp6,icmpv6_code=0,icmpv6_type=135,nd_target=fc00::241,action=ct(commit,nat(dst=fc00::1)),1
>  ])
>
>  AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> @@ -3070,6 +3079,102 @@ NS_CHECK_EXEC([at_ns1], [wget http://[[fc00::1]] -t 3 
> -T 1 -v -o wget1.log], [4]
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([conntrack - IPv6 HTTP with DNAT])
> +CHECK_CONNTRACK()
> +CHECK_CONNTRACK_NAT()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "fc00::1/96")
> +ADD_VETH(p1, at_ns1, br0, "fc00::2/96")
> +NS_CHECK_EXEC([at_ns0], [ip -6 link set dev p0 address 80:88:88:88:88:77])
> +NS_CHECK_EXEC([at_ns1], [ip -6 link set dev p1 address 80:88:88:88:88:88])
> +NS_CHECK_EXEC([at_ns0], [ip -6 neigh add fc00::240 lladdr 80:88:88:88:88:88 
> dev p0])
> +NS_CHECK_EXEC([at_ns1], [ip -6 neigh add fc00::1 lladdr 80:88:88:88:88:77 
> dev p1])
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from 
> ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=100 
> in_port=1,ip6,ipv6_dst=fc00::240,action=ct(zone=1,nat(dst=fc00::2),commit),2
> +priority=100 in_port=2,ct_state=-trk,ip6,action=ct(table=0,nat,zone=1)
> +priority=100 in_port=2,ct_state=+trk+est,ct_zone=1,ip6,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested 

Re: [ovs-dev] [patch_v7 5/9] dpdk: Add orig tuple context recovery.

2017-04-29 Thread Daniele Di Proietto
2017-03-24 2:15 GMT-07:00 Darrell Ball :
> This patch adds orig tuple checking and context
> recovery; NAT interactions are factored in.
> Orig tuple support exists to better handle policy
> changes.
>
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack.c | 69 
> +
>  1 file changed, 69 insertions(+)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index ee22280..5524495 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -679,6 +679,72 @@ handle_nat(struct dp_packet *pkt, struct conn *conn,
>  }
>  }
>
> +static bool
> +check_orig_tuple(struct conntrack *ct, struct dp_packet *pkt,
> + struct conn_lookup_ctx *ctx_in, long long now,
> + unsigned *bucket, struct conn **conn,
> + const struct nat_action_info_t *nat_action_info)
> +OVS_REQUIRES(ct->buckets[*bucket].lock)
> +{
> +if ((ctx_in->key.dl_type == htons(ETH_TYPE_IP) &&
> + !pkt->md.ct_orig_tuple.ipv4.ipv4_proto) ||
> +(ctx_in->key.dl_type == htons(ETH_TYPE_IPV6) &&
> + !pkt->md.ct_orig_tuple.ipv6.ipv6_proto) ||
> +!(pkt->md.ct_state & (CS_SRC_NAT | CS_DST_NAT)) ||
> +nat_action_info){
> +return false;
> +}
> +
> +ct_lock_unlock(>buckets[*bucket].lock);
> +struct conn_lookup_ctx ctx;
> +memset(, 0 , sizeof ctx);
> +ctx.conn = NULL;
> +
> +if (ctx_in->key.dl_type == htons(ETH_TYPE_IP)) {
> +
> +ctx.key.src.addr.ipv4_aligned = pkt->md.ct_orig_tuple.ipv4.ipv4_src;
> +ctx.key.dst.addr.ipv4_aligned = pkt->md.ct_orig_tuple.ipv4.ipv4_dst;
> +
> +if (ctx_in->key.nw_proto == IPPROTO_ICMP) {
> +ctx.key.src.icmp_id = ctx_in->key.src.icmp_id;
> +ctx.key.dst.icmp_id = ctx_in->key.dst.icmp_id;
> +uint16_t src_port = ntohs(pkt->md.ct_orig_tuple.ipv4.src_port);
> +ctx.key.src.icmp_type = (uint8_t) src_port;
> +ctx.key.dst.icmp_type = reverse_icmp_type(ctx.key.src.icmp_type);
> +} else {
> +ctx.key.src.port = pkt->md.ct_orig_tuple.ipv4.src_port;
> +ctx.key.dst.port = pkt->md.ct_orig_tuple.ipv4.dst_port;
> +}
> +ctx.key.nw_proto = pkt->md.ct_orig_tuple.ipv4.ipv4_proto;
> +} else {
> +ctx.key.src.addr.ipv6_aligned = pkt->md.ct_orig_tuple.ipv6.ipv6_src;
> +ctx.key.dst.addr.ipv6_aligned = pkt->md.ct_orig_tuple.ipv6.ipv6_dst;
> +
> +if (ctx_in->key.nw_proto == IPPROTO_ICMPV6) {
> +ctx.key.src.icmp_id = ctx_in->key.src.icmp_id;
> +ctx.key.dst.icmp_id = ctx_in->key.dst.icmp_id;
> +uint16_t src_port = ntohs(pkt->md.ct_orig_tuple.ipv6.src_port);
> +ctx.key.src.icmp_type = (uint8_t) src_port;
> +ctx.key.dst.icmp_type = 
> reverse_icmp6_type(ctx.key.src.icmp_type);
> +} else {
> +ctx.key.src.port = pkt->md.ct_orig_tuple.ipv6.src_port;
> +ctx.key.dst.port = pkt->md.ct_orig_tuple.ipv6.dst_port;
> +}
> +ctx.key.nw_proto = pkt->md.ct_orig_tuple.ipv6.ipv6_proto;
> +}
> +
> +ctx.key.dl_type = ctx_in->key.dl_type;
> +ctx.key.zone = pkt->md.ct_zone;
> +
> +ctx.hash = conn_key_hash(, ct->hash_basis);
> +*bucket = hash_to_bucket(ctx.hash);
> +ct_lock_lock(>buckets[*bucket].lock);
> +conn_key_lookup(>buckets[*bucket], , now);
> +*conn = ctx.conn;
> +
> +return *conn ? true : false;
> +}
> +
>  static void
>  process_one(struct conntrack *ct, struct dp_packet *pkt,
>  struct conn_lookup_ctx *ctx, uint16_t zone,
> @@ -735,6 +801,9 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
>  if (nat_action_info && !create_new_conn) {
>  handle_nat(pkt, conn, zone, ctx->reply, ctx->related);
>  }
> +
> +}else if (check_orig_tuple(ct, pkt, ctx, now, , ,
> +   nat_action_info)) {

Sorry, I don't understand the feature very much, so I'm going to ask a
couple of dumb
questions :-)

Why is the body of this 'else if' empty? Could you explain a little bit more
in the commit message?

Thanks

>  } else {
>  if (ctx->related) {
>  pkt->md.ct_state = CS_INVALID;
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v7 2/9] dpdk: Remove batch sorting in userspace conntrack.

2017-04-29 Thread Daniele Di Proietto
Thanks for doing this

Acked-by: Daniele Di Proietto <diproiet...@ovn.org>

2017-03-24 2:15 GMT-07:00 Darrell Ball <dlu...@gmail.com>:
> Signed-off-by: Darrell Ball <dlu...@gmail.com>
> Acked-by: Flavio Leitner <f...@sysclose.org>
> ---
>  lib/conntrack.c | 58 
> +++--
>  1 file changed, 11 insertions(+), 47 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 4f490fb..9a0763e 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -318,22 +318,9 @@ conntrack_execute(struct conntrack *ct, struct 
> dp_packet_batch *pkt_batch,
>  {
>  struct dp_packet **pkts = pkt_batch->packets;
>  size_t cnt = pkt_batch->count;
> -#if !defined(__CHECKER__) && !defined(_WIN32)
> -const size_t KEY_ARRAY_SIZE = cnt;
> -#else
> -enum { KEY_ARRAY_SIZE = NETDEV_MAX_BURST };
> -#endif
> -struct conn_lookup_ctx ctxs[KEY_ARRAY_SIZE];
> -int8_t bucket_list[CONNTRACK_BUCKETS];
> -struct {
> -unsigned bucket;
> -unsigned long maps;
> -} arr[KEY_ARRAY_SIZE];
> +struct conn_lookup_ctx ctx;
>  long long now = time_msec();
>  size_t i = 0;
> -uint8_t arrcnt = 0;
> -
> -BUILD_ASSERT_DECL(sizeof arr[0].maps * CHAR_BIT >= NETDEV_MAX_BURST);
>
>  if (helper) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
> @@ -342,48 +329,25 @@ conntrack_execute(struct conntrack *ct, struct 
> dp_packet_batch *pkt_batch,
>  /* Continue without the helper */
>  }
>
> -memset(bucket_list, INT8_C(-1), sizeof bucket_list);
>  for (i = 0; i < cnt; i++) {
> -unsigned bucket;
>
> -if (!conn_key_extract(ct, pkts[i], dl_type, [i], zone)) {
> +if (!conn_key_extract(ct, pkts[i], dl_type, , zone)) {
>  write_ct_md(pkts[i], CS_INVALID, zone, NULL, NULL);
>  continue;
>  }
>
> -bucket = hash_to_bucket(ctxs[i].hash);
> -if (bucket_list[bucket] == INT8_C(-1)) {
> -bucket_list[bucket] = arrcnt;
> -
> -arr[arrcnt].maps = 0;
> -ULLONG_SET1(arr[arrcnt].maps, i);
> -arr[arrcnt++].bucket = bucket;
> -} else {
> -ULLONG_SET1(arr[bucket_list[bucket]].maps, i);
> -}
> -}
> -
> -for (i = 0; i < arrcnt; i++) {
> -struct conntrack_bucket *ctb = >buckets[arr[i].bucket];
> -size_t j;
> -
> +struct conntrack_bucket *ctb = >buckets[i];
>  ct_lock_lock(>lock);
> +conn_key_lookup(ctb, , now);
> +struct conn *conn = process_one(ct, pkts[i], , zone,
> +force, commit, now);
>
> -ULLONG_FOR_EACH_1(j, arr[i].maps) {
> -struct conn *conn;
> -
> -conn_key_lookup(ctb, [j], now);
> -
> -conn = process_one(ct, pkts[j], [j], zone, force, commit,
> -   now);
> -
> -if (conn && setmark) {
> -set_mark(pkts[j], conn, setmark[0], setmark[1]);
> -}
> +if (conn && setmark) {
> +set_mark(pkts[i], conn, setmark[0], setmark[1]);
> +}
>
> -if (conn && setlabel) {
> -set_label(pkts[j], conn, [0], [1]);
> -}
> +if (conn && setlabel) {
> +set_label(pkts[i], conn, [0], [1]);
>  }
>  ct_lock_unlock(>lock);
>  }
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] tests/pmd.at: Fix race in "PMD - change numa node" test

2017-04-29 Thread Daniele Di Proietto
Thanks!

Applied to master, branch-2.7 and branch-2.6

2017-04-21 6:38 GMT-07:00 Timothy Redaelli <tredae...@redhat.com>:
> Sometimes the test fails since dpif-netdev may process the 2 packets
> in the "wrong" order.
>
> This commit avoids the problem by printing (monitor) and verifying
> any single packet instead of checking the 2 packets at the same time.
>
> CC: Daniele Di Proietto <daniele.di.proie...@gmail.com>
> Fixes: a12e2a88d672 ("test: Add more pmd tests.")
> Signed-off-by: Timothy Redaelli <tredae...@redhat.com>
> ---
>  tests/pmd.at | 28 ++--
>  1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/tests/pmd.at b/tests/pmd.at
> index 5686bedc..2816d45c 100644
> --- a/tests/pmd.at
> +++ b/tests/pmd.at
> @@ -338,14 +338,22 @@ AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl 
> --detach --no-chdir --pidfile
>
>  AT_CHECK([ovs-appctl netdev-dummy/receive p1 --qid 0 
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
>
> -AT_CHECK([ovs-appctl netdev-dummy/receive p2 --qid 1 
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
> -
> -OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 4])
> +OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 2])
>  OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
>
>  AT_CHECK([cat ofctl_monitor.log], [0], [dnl
>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) 
> data_len=42 (unbuffered)
>  
> icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.0.0.2,nw_dst=10.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
>  icmp_csum:f7ff
> +])
> +
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir 
> --pidfile 2> ofctl_monitor.log])
> +
> +AT_CHECK([ovs-appctl netdev-dummy/receive p2 --qid 1 
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
> +
> +OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 2])
> +OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
> +
> +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=2 (via action) 
> data_len=42 (unbuffered)
>  
> icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.0.0.2,nw_dst=10.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
>  icmp_csum:f7ff
>  ])
> @@ -363,14 +371,22 @@ AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl 
> --detach --no-chdir --pidfile
>
>  AT_CHECK([ovs-appctl netdev-dummy/receive p1 --qid 1 
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
>
> -AT_CHECK([ovs-appctl netdev-dummy/receive p2 --qid 0 
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
> -
> -OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 4])
> +OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 2])
>  OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
>
>  AT_CHECK([cat ofctl_monitor.log], [0], [dnl
>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) 
> data_len=42 (unbuffered)
>  
> icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.0.0.2,nw_dst=10.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
>  icmp_csum:f7ff
> +])
> +
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir 
> --pidfile 2> ofctl_monitor.log])
> +
> +AT_CHECK([ovs-appctl netdev-dummy/receive p2 --qid 0 
> 'in_port(1),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
> +
> +OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 2])
> +OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
> +
> +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
>  NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=2 (via action) 
> data_len=42 (unbuffered)
>  
> icmp,vlan_tci=0x,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.0.0.2,nw_dst=10.0.0.1,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
>  icmp_csum:f7ff
>  ])
> --
> 2.12.0
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC] packets: Do not initialize ct_orig_tuple.

2017-03-10 Thread Daniele Di Proietto
Commit daf4d3c18da4("odp: Support conntrack orig tuple key.") introduced
new fields in struct 'pkt_metadata'.  pkt_metadata_init() is called for
every packet in the userspace datapath.  When testing a simple single
flow case with DPDK, we observe a lower throughput after the above
commit (it was 14.88 Mpps before, it is 13 Mpps after).

This patch skips initializing ct_orig_tuple in pkt_metadata_init().
It should be enough to initialize ct_state, because nobody should look
at ct_orig_tuple unless ct_state is != 0.

CC: Jarno Rajahalme <ja...@ovn.org>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
I'm sending this as an RFC because I didn't check very carefully if we can
really avoid initializing ct_orig_tuple.

Maybe there are better solutions to this problem.
---
 lib/packets.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/packets.h b/lib/packets.h
index a5a483bc8..6f1791c7a 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -129,7 +129,7 @@ pkt_metadata_init(struct pkt_metadata *md, odp_port_t port)
 /* It can be expensive to zero out all of the tunnel metadata. However,
  * we can just zero out ip_dst and the rest of the data will never be
  * looked at. */
-memset(md, 0, offsetof(struct pkt_metadata, in_port));
+memset(md, 0, offsetof(struct pkt_metadata, ct_orig_tuple));
 md->tunnel.ip_dst = 0;
 md->tunnel.ipv6_dst = in6addr_any;
 
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 branch-2.6] docs: Use DPDK 16.07.2 stable release

2017-03-10 Thread Daniele Di Proietto
2017-03-10 3:47 GMT-08:00 Ian Stokes :
> DPDK now provides a stable release branch. Modify dpdk docs and travis
> linux build script to use the DPDK 16.07.2 stable branch to benefit from
> most recent bug fixes.
>
> Signed-off-by: Ian Stokes 

Thanks, applied to branch-2.6

> ---
> v1 -> v2
> * Set correct path to DPDK stable branch for EXTRA_OPTS in travis linux
>   build.
> ---
>  .travis/linux-build.sh   |   14 +++---
>  FAQ.md   |4 ++--
>  INSTALL.DPDK-ADVANCED.md |6 +++---
>  INSTALL.DPDK.md  |   22 ++
>  4 files changed, 22 insertions(+), 24 deletions(-)
>
> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> index 3bcec93..f15f706 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -52,13 +52,13 @@ function install_kernel()
>  function install_dpdk()
>  {
>  if [ -n "$DPDK_GIT" ]; then
> -git clone $DPDK_GIT dpdk-$1
> -cd dpdk-$1
> -git checkout v$1
> +git clone $DPDK_GIT dpdk-stable-$1
> +cd dpdk-stable-$1
> +git checkout tags/v$1
>  else
> -wget http://www.dpdk.org/browse/dpdk/snapshot/dpdk-$1.tar.gz
> +wget http://fast.dpdk.org/rel/dpdk-$1.tar.gz
>  tar xzvf dpdk-$1.tar.gz > /dev/null
> -cd dpdk-$1
> +cd dpdk-stable-$1
>  fi
>  find ./ -type f | xargs sed -i 
> 's/max-inline-insns-single=100/max-inline-insns-single=400/'
>  echo 'CONFIG_RTE_BUILD_FPIC=y' >>config/common_linuxapp
> @@ -80,14 +80,14 @@ fi
>
>  if [ "$DPDK" ]; then
>  if [ -z "$DPDK_VER" ]; then
> -DPDK_VER="16.07"
> +DPDK_VER="16.07.2"
>  fi
>  install_dpdk $DPDK_VER
>  if [ "$CC" = "clang" ]; then
>  # Disregard cast alignment errors until DPDK is fixed
>  CFLAGS="$CFLAGS -Wno-cast-align"
>  fi
> -EXTRA_OPTS="$EXTRA_OPTS --with-dpdk=./dpdk-$DPDK_VER/build"
> +EXTRA_OPTS="$EXTRA_OPTS --with-dpdk=./dpdk-stable-$DPDK_VER/build"
>  elif [ "$CC" != "clang" ]; then
>  # DPDK headers currently trigger sparse errors
>  SPARSE_FLAGS="$SPARSE_FLAGS -Wsparse-error"
> diff --git a/FAQ.md b/FAQ.md
> index cf30f9b..75a393b 100644
> --- a/FAQ.md
> +++ b/FAQ.md
> @@ -256,12 +256,12 @@ A: The following table lists the DPDK version against 
> which the
> given versions of Open vSwitch will successfully build.
>
>  | Open vSwitch | DPDK
> -|::|:-:
> +|::|:---:
>  |2.2.x | 1.6
>  |2.3.x | 1.6
>  |2.4.x | 2.0
>  |2.5.x | 2.2
> -|2.6.x | 16.07
> +|2.6.x | 16.07.2
>
>  ### Q: I get an error like this when I configure Open vSwitch:
>
> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
> index e3603a1..ae21aca 100755
> --- a/INSTALL.DPDK-ADVANCED.md
> +++ b/INSTALL.DPDK-ADVANCED.md
> @@ -46,7 +46,7 @@ for DPDK and OVS.
>  For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc`
>
>  ```
> -export DPDK_DIR=/usr/src/dpdk-16.07
> +export DPDK_DIR=/usr/src/dpdk-stable-16.07.2
>  export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>  make install T=$DPDK_TARGET DESTDIR=install
>  ```
> @@ -342,7 +342,7 @@ For users wanting to do packet forwarding using kernel 
> stack below are the steps
> cd /usr/src/cmdline_generator
> wget 
> https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/cmdline_generator.c
> wget 
> https://raw.githubusercontent.com/netgroup-polito/un-orchestrator/master/orchestrator/compute_controller/plugins/kvm-libvirt/cmdline_generator/Makefile
> -   export RTE_SDK=/usr/src/dpdk-16.07
> +   export RTE_SDK=/usr/src/dpdk-stable-16.07.2
> export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
> make
> ./build/cmdline_generator -m -p dpdkr0 XXX
> @@ -366,7 +366,7 @@ For users wanting to do packet forwarding using kernel 
> stack below are the steps
> mount -t hugetlbfs nodev /dev/hugepages (if not already mounted)
>
> # Build the DPDK ring application in the VM
> -   export RTE_SDK=/root/dpdk-16.07
> +   export RTE_SDK=/root/dpdk-stable-16.07.2
> export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc
> make
>
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 30e9258..9ab29f3 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -21,7 +21,7 @@ The DPDK support of Open vSwitch is considered 
> 'experimental'.
>
>  ### Prerequisites
>
> -* Required: DPDK 16.07
> +* Required: DPDK 16.07.2
>  * Hardware: [DPDK Supported NICs] when physical ports in use
>
>  ##  2. Building and Installation
> @@ -42,10 +42,9 @@ advanced install guide [INSTALL.DPDK-ADVANCED.md]
>
>   ```
>   cd /usr/src/
> - wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.07.zip
> - unzip dpdk-16.07.zip
> -
> - export DPDK_DIR=/usr/src/dpdk-16.07
> + 

Re: [ovs-dev] [PATCH v2] docs: Use DPDK 16.11.1 stable release.

2017-03-10 Thread Daniele Di Proietto
2017-03-10 3:47 GMT-08:00 Ian Stokes :
> DPDK now provides a stable release branch. Modify dpdk docs and travis linux
> build script to use the DPDK 16.11.1 stable branch to benefit from most
> recent bug fixes.
>
> Signed-off-by: Ian Stokes 

Thanks, applied to master and branch-2.7

> ---
> v1 -> v2
> * Set correct path to DPDK stable branch for EXTRA_OPTS in travis linux
>   build.
> ---
>  .travis/linux-build.sh   |   12 ++--
>  Documentation/faq/releases.rst   |   10 +-
>  Documentation/intro/install/dpdk.rst |6 +++---
>  Documentation/topics/dpdk/vhost-user.rst |8 
>  4 files changed, 18 insertions(+), 18 deletions(-)
>
> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> index 4175d72..8750d68 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -52,13 +52,13 @@ function install_kernel()
>  function install_dpdk()
>  {
>  if [ -n "$DPDK_GIT" ]; then
> -git clone $DPDK_GIT dpdk-$1
> -cd dpdk-$1
> -git checkout v$1
> +git clone $DPDK_GIT dpdk-stable-$1
> +cd dpdk-stable-$1
> +git checkout tags/v$1
>  else
>  wget http://fast.dpdk.org/rel/dpdk-$1.tar.gz
>  tar xzvf dpdk-$1.tar.gz > /dev/null
> -cd dpdk-$1
> +cd dpdk-stable-$1
>  fi
>  find ./ -type f | xargs sed -i 
> 's/max-inline-insns-single=100/max-inline-insns-single=400/'
>  echo 'CONFIG_RTE_BUILD_FPIC=y' >>config/common_linuxapp
> @@ -80,14 +80,14 @@ fi
>
>  if [ "$DPDK" ]; then
>  if [ -z "$DPDK_VER" ]; then
> -DPDK_VER="16.11"
> +DPDK_VER="16.11.1"
>  fi
>  install_dpdk $DPDK_VER
>  if [ "$CC" = "clang" ]; then
>  # Disregard cast alignment errors until DPDK is fixed
>  CFLAGS="$CFLAGS -Wno-cast-align"
>  fi
> -EXTRA_OPTS="$EXTRA_OPTS --with-dpdk=./dpdk-$DPDK_VER/build"
> +EXTRA_OPTS="$EXTRA_OPTS --with-dpdk=./dpdk-stable-$DPDK_VER/build"
>  elif [ "$CC" != "clang" ]; then
>  # DPDK headers currently trigger sparse errors
>  SPARSE_FLAGS="$SPARSE_FLAGS -Wsparse-error"
> diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
> index 118c88d..98f5636 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -152,16 +152,16 @@ Q: What DPDK version does each Open vSwitch release 
> work with?
>  A: The following table lists the DPDK version against which the given
>  versions of Open vSwitch will successfully build.
>
> - =
> + ===
>  Open vSwitch DPDK
> - =
> + ===
>  2.2.x1.6
>  2.3.x1.6
>  2.4.x2.0
>  2.5.x2.2
> -2.6.x16.07
> -2.7.x16.11
> - =
> +2.6.x16.07.2
> +2.7.x16.11.1
> + ===
>
>  Q: I get an error like this when I configure Open vSwitch::
>
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 3018590..b947bd5 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -64,9 +64,9 @@ Install DPDK
>  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
>
> $ cd /usr/src/
> -   $ wget http://fast.dpdk.org/rel/dpdk-16.11.tar.xz
> -   $ tar xf dpdk-16.11.tar.xz
> -   $ export DPDK_DIR=/usr/src/dpdk-16.11
> +   $ wget http://fast.dpdk.org/rel/dpdk-16.11.1.tar.xz
> +   $ tar xf dpdk-16.11.1.tar.xz
> +   $ export DPDK_DIR=/usr/src/dpdk-stable-16.11.1
> $ cd $DPDK_DIR
>
>  #. (Optional) Configure DPDK as a shared library
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 5448bd2..ba22684 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -278,9 +278,9 @@ To begin, instantiate a guest as described in 
> :ref:`dpdk-vhost-user` or
>  DPDK sources to VM and build DPDK::
>
>  $ cd /root/dpdk/
> -$ wget http://fast.dpdk.org/rel/dpdk-16.11.tar.xz
> -$ tar xf dpdk-16.11.tar.xz
> -$ export DPDK_DIR=/root/dpdk/dpdk-16.11
> +$ wget http://fast.dpdk.org/rel/dpdk-16.11.1.tar.xz
> +$ tar xf dpdk-16.11.1.tar.xz
> +$ export DPDK_DIR=/root/dpdk/dpdk-stable-16.11.1
>  $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
>  $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>  $ cd $DPDK_DIR
> @@ -364,7 +364,7 @@ Sample XML
>  
>  
>
> -  
> +  
>
>
>  
> --
> 1.7.0.7
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org

[ovs-dev] [PATCH v3 3/3] ofp-actions: Add limit to learn action.

2017-03-10 Thread Daniele Di Proietto
This commit adds a new feature to the learn actions: the possibility to
limit the number of learned flows.

To be compatible with users of the old learn action, a new structure is
introduced as well as a new OpenFlow raw action number.

There's a small corner case when we have to delete the ukey.  This
happens when:
* The learned rule has expired (or has been deleted).
* The ukey that learned the rule is still in the datapath.
* No packets hit the datapath flow recently.
In this case we cannot relearn the rule (because there are no new
packets), and the actions might depend on the learn execution, so the
only option is to delete the ukey.  I don't think this has big
performance implications since it's done only for ukey with no traffic.

We could also slowpath it, but that will cause an action upcall and the
correct datapath actions will be installed later by a revalidator.  If
we delete the ukey, the next upcall will be a miss upcall and that will
immediatedly install the correct datapath flow.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 include/openvswitch/ofp-actions.h  |  12 +++
 lib/learn.c|  24 +
 lib/ofp-actions.c  |  88 -
 ofproto/ofproto-dpif-upcall.c  |   4 +
 ofproto/ofproto-dpif-xlate-cache.c |   3 +-
 ofproto/ofproto-dpif-xlate-cache.h |   1 +
 ofproto/ofproto-dpif-xlate.c   |  26 -
 ofproto/ofproto-dpif-xlate.h   |   3 +
 ofproto/ofproto-provider.h |   3 +-
 ofproto/ofproto.c  |  46 +++--
 tests/learn.at | 191 +
 tests/ofp-actions.at   |  14 +++
 utilities/ovs-ofctl.8.in   |  16 
 13 files changed, 419 insertions(+), 12 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index d4be46f0c..6f554fe0c 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -660,6 +660,10 @@ struct ofpact_resubmit {
  * If NX_LEARN_F_SEND_FLOW_REM is set, then the learned flows will have their
  * OFPFF_SEND_FLOW_REM flag set.
  *
+ * If NX_LEARN_F_WRITE_RESULT is set, then the actions will write whether the
+ * learn operation succeded on a bit.  If the learn is successful the bit will
+ * be set, otherwise (e.g. if the limit is hit) the bit will be unset.
+ *
  * If NX_LEARN_F_DELETE_LEARNED is set, then removing this action will delete
  * all the flows from the learn action's 'table_id' that have the learn
  * action's 'cookie'.  Important points:
@@ -685,6 +689,7 @@ struct ofpact_resubmit {
 enum nx_learn_flags {
 NX_LEARN_F_SEND_FLOW_REM = 1 << 0,
 NX_LEARN_F_DELETE_LEARNED = 1 << 1,
+NX_LEARN_F_WRITE_RESULT = 1 << 2,
 };
 
 #define NX_LEARN_N_BITS_MASK0x3ff
@@ -748,6 +753,13 @@ struct ofpact_learn {
 ovs_be64 cookie;   /* Cookie for new flow. */
 uint16_t fin_idle_timeout; /* Idle timeout after FIN, if nonzero. */
 uint16_t fin_hard_timeout; /* Hard timeout after FIN, if nonzero. */
+/* If the number of flows on 'table_id' with 'cookie' exceeds this,
+ * the action will not add a new flow. */
+uint32_t limit;
+/* Used only if 'flags' has NX_LEARN_F_WRITE_RESULT.  If the execution
+ * failed to install a new flow because 'limit' is exceeded,
+ * result_dst will be set to 0, otherwise to 1. */
+struct mf_subfield result_dst;
 );
 
 struct ofpact_learn_spec specs[];
diff --git a/lib/learn.c b/lib/learn.c
index 199084905..d747d255e 100644
--- a/lib/learn.c
+++ b/lib/learn.c
@@ -407,6 +407,22 @@ learn_parse__(char *orig, char *arg, struct ofpbuf 
*ofpacts)
 learn->flags |= NX_LEARN_F_SEND_FLOW_REM;
 } else if (!strcmp(name, "delete_learned")) {
 learn->flags |= NX_LEARN_F_DELETE_LEARNED;
+} else if (!strcmp(name, "limit")) {
+learn->limit = atoi(value);
+} else if (!strcmp(name, "result_dst")) {
+char *error;
+learn->flags |= NX_LEARN_F_WRITE_RESULT;
+error = mf_parse_subfield(>result_dst, value);
+if (error) {
+return error;
+}
+if (!learn->result_dst.field->writable) {
+return xasprintf("%s is read-only", value);
+}
+if (learn->result_dst.n_bits != 1) {
+return xasprintf("result_dst in 'learn' action must be a "
+ "single bit");
+}
 } else {
 struct ofpact_learn_spec *spec;
 char *error;
@@ -488,6 +504,14 @@ learn_format(const struct ofpact_learn *learn, struct ds 
*s)
 ds_put_format(s, ",%scookie=%s%#"PRIx64,
   colors.param, colors.end, ntohll(learn->cookie));
 }
+if (learn->

[ovs-dev] [PATCH v3 2/3] ofp-actions: Factor out decode_LEARN_{common, spec}().

2017-03-10 Thread Daniele Di Proietto
No functional change, they will be used by next commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ofp-actions.c | 58 ++-
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 2c8ab1788..603435857 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4352,23 +4352,14 @@ learn_min_len(uint16_t header)
 return min_len;
 }
 
-/* Converts 'nal' into a "struct ofpact_learn" and appends that struct to
- * 'ofpacts'.  Returns 0 if successful, otherwise an OFPERR_*. */
 static enum ofperr
-decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
-   enum ofp_version ofp_version OVS_UNUSED,
-   const struct vl_mff_map *vl_mff_map,
-   struct ofpbuf *ofpacts)
+decode_LEARN_common(const struct nx_action_learn *nal,
+struct ofpact_learn *learn)
 {
-struct ofpact_learn *learn;
-const void *p, *end;
-
 if (nal->pad) {
 return OFPERR_OFPBAC_BAD_ARGUMENT;
 }
 
-learn = ofpact_put_LEARN(ofpacts);
-
 learn->idle_timeout = ntohs(nal->idle_timeout);
 learn->hard_timeout = ntohs(nal->hard_timeout);
 learn->priority = ntohs(nal->priority);
@@ -4376,19 +4367,23 @@ decode_NXAST_RAW_LEARN(const struct nx_action_learn 
*nal,
 learn->table_id = nal->table_id;
 learn->fin_idle_timeout = ntohs(nal->fin_idle_timeout);
 learn->fin_hard_timeout = ntohs(nal->fin_hard_timeout);
-
 learn->flags = ntohs(nal->flags);
-if (learn->flags & ~(NX_LEARN_F_SEND_FLOW_REM |
- NX_LEARN_F_DELETE_LEARNED)) {
-return OFPERR_OFPBAC_BAD_ARGUMENT;
-}
 
 if (learn->table_id == 0xff) {
 return OFPERR_OFPBAC_BAD_ARGUMENT;
 }
 
-end = (char *) nal + ntohs(nal->len);
-for (p = nal + 1; p != end; ) {
+return 0;
+}
+
+static enum ofperr
+decode_LEARN_specs(const void *p, const void *end,
+   const struct vl_mff_map *vl_mff_map,
+   struct ofpbuf *ofpacts)
+{
+struct ofpact_learn *learn = ofpacts->header;
+
+while (p != end) {
 struct ofpact_learn_spec *spec;
 uint16_t header = ntohs(get_be16());
 
@@ -4461,6 +4456,33 @@ decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
 return 0;
 }
 
+/* Converts 'nal' into a "struct ofpact_learn" and appends that struct to
+ * 'ofpacts'.  Returns 0 if successful, otherwise an OFPERR_*. */
+static enum ofperr
+decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
+   enum ofp_version ofp_version OVS_UNUSED,
+   const struct vl_mff_map *vl_mff_map,
+   struct ofpbuf *ofpacts)
+{
+struct ofpact_learn *learn;
+enum ofperr error;
+
+learn = ofpact_put_LEARN(ofpacts);
+
+error = decode_LEARN_common(nal, learn);
+if (error) {
+return error;
+}
+
+if (learn->flags & ~(NX_LEARN_F_SEND_FLOW_REM |
+ NX_LEARN_F_DELETE_LEARNED)) {
+return OFPERR_OFPBAC_BAD_ARGUMENT;
+}
+
+return decode_LEARN_specs(nal + 1, (char *) nal + ntohs(nal->len),
+  vl_mff_map, ofpacts);
+}
+
 static void
 put_be16(struct ofpbuf *b, ovs_be16 x)
 {
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/3] ofproto-dpif-xlate: Create XC_LEARN entry after learning.

2017-03-10 Thread Daniele Di Proietto
This will be useful in a separate commit, because learning can fail.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Ben Pfaff <b...@ovn.org>
---
 ofproto/ofproto-dpif-xlate.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 8ce6a5939..8c4b714b2 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4579,11 +4579,7 @@ xlate_learn_action(struct xlate_ctx *ctx, const struct 
ofpact_learn *learn)
 enum ofperr error;
 
 if (ctx->xin->xcache) {
-struct xc_entry *entry;
-
-entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN);
-entry->learn.ofm = xmalloc(sizeof *entry->learn.ofm);
-ofm = entry->learn.ofm;
+ofm = xmalloc(sizeof *ofm);
 } else {
 ofm = __;
 }
@@ -4617,8 +4613,22 @@ xlate_learn_action(struct xlate_ctx *ctx, const struct 
ofpact_learn *learn)
  , ofm);
 ofpbuf_uninit();
 
-if (!error && ctx->xin->allow_side_effects) {
-error = ofproto_flow_mod_learn(ofm, ctx->xin->xcache != NULL);
+if (!error) {
+if (ctx->xin->allow_side_effects) {
+error = ofproto_flow_mod_learn(ofm, ctx->xin->xcache != NULL);
+}
+
+if (ctx->xin->xcache) {
+struct xc_entry *entry;
+
+entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN);
+entry->learn.ofm = ofm;
+ofm = NULL;
+}
+}
+
+if (ctx->xin->xcache) {
+free(ofm);
 }
 
 if (error) {
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 0/3] Learn action limit

2017-03-10 Thread Daniele Di Proietto
This series implements the possibility to have a limit on the number of
flows learned by a learn action.  After the learn action execution the
pipeline can read the result (to know if the limit was exceeded).

v2->v3:
* When the learned rule expired and there are no packets, instead of
  slowpathing the ukey, delete it.  This should make the next upcall
  a miss upcall and ovs should immediately install a flow. Suggested
  by Joe (thanks!).
* NXAST_RAW_LEARN2 is now subtype 45 instead of 44.

v1->v2:
* 'limit' counts both learn flows and flows installed by controller,
  suggested by Ben.
* Don't keep a reference to the counter in ukeys
* Squash tests, openflow interface changes and ofproto implementation
  into a single commit
* The new cookie-counters module is not used anymore, therefore it's removed
* Fix memory leak in ofproto_flow_mod_learn(): we have to call
  ofproto_flow_mod_uninit() if we don't call ofproto_flow_mod_learn_start().
* Simplify ofp-actions changes according to Ben comments(thanks!)


Daniele Di Proietto (3):
  ofproto-dpif-xlate: Create XC_LEARN entry after learning.
  ofp-actions: Factor out decode_LEARN_{common,spec}().
  ofp-actions: Add limit to learn action.

 include/openvswitch/ofp-actions.h  |  12 +++
 lib/learn.c|  24 +
 lib/ofp-actions.c  | 146 
 ofproto/ofproto-dpif-upcall.c  |   4 +
 ofproto/ofproto-dpif-xlate-cache.c |   3 +-
 ofproto/ofproto-dpif-xlate-cache.h |   1 +
 ofproto/ofproto-dpif-xlate.c   |  46 +++--
 ofproto/ofproto-dpif-xlate.h   |   3 +
 ofproto/ofproto-provider.h |   3 +-
 ofproto/ofproto.c  |  46 +++--
 tests/learn.at | 191 +
 tests/ofp-actions.at   |  14 +++
 utilities/ovs-ofctl.8.in   |  16 
 13 files changed, 474 insertions(+), 35 deletions(-)

-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC 4/4] dpif-netdev: Don't uninit emc on reload.

2017-03-09 Thread Daniele Di Proietto
2017-02-21 6:49 GMT-08:00 Ilya Maximets :
> There are many reasons for reloading of pmd threads:
> * reconfiguration of one of the ports.
> * Adjusting of static_tx_qid.
> * Adding new tx/rx ports.
>
> In many cases EMC is still useful after reload and uninit
> will only lead to unnecessary upcalls/classifier lookups.
>
> Such behaviour slows down the datapath. Uninit itself slows
> down the reload path. All this factors leads to additional
> unexpected latencies/drops on events not directly connected
> to current PMD thread.
>
> Lets not uninitialize emc cache on reload path.
> 'emc_cache_slow_sweep()' and replacements should free all
> the old/unwanted entries.
>
> Signed-off-by: Ilya Maximets 
> ---
>
> In discussion of original patch[1] Pravin Shelar said:
> '''
>   emc cache flows should be free on reload, otherwise flows
>   might stick around for longer time.
> '''
>
> After that (for another reason) slow sweep was introduced.
> There are few reasons to move uninit out of reload path
> described in commit-message.
>
> So, this patch sent as RFC, because I wanted to hear some
> opinions about current situation and drawbacks of such
> solution.
>
> Any thoughts?
> Thanks.
>
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2014-August/287852.html
>

I'm in favor of this approach. Now that we have slow sweep we don't need to
clear it at every reload.

Just curious, did you measure any difference in reload time with this patch?

 We could init and uninit the EMC from the main thread in
dp_netdev_configure_pmd() and dp_netdev_del_pmd().  This way we wouldn't
need a special case for the non-pmd thread in the code, but it would be slower.
In any case it seems fine with me

Thanks,

Daniele

>  lib/dpif-netdev.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index e2b4f39..3bd568d 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3648,9 +3648,9 @@ pmd_thread_main(void *f_)
>  ovs_numa_thread_setaffinity_core(pmd->core_id);
>  dpdk_set_lcore_id(pmd->core_id);
>  poll_cnt = pmd_load_queues_and_ports(pmd, _list);
> +emc_cache_init(>flow_cache);
>  reload:
>  pmd_alloc_static_tx_qid(pmd);
> -emc_cache_init(>flow_cache);
>
>  /* List port/core affinity */
>  for (i = 0; i < poll_cnt; i++) {
> @@ -3697,13 +3697,13 @@ reload:
>   * reloading the updated configuration. */
>  dp_netdev_pmd_reload_done(pmd);
>
> -emc_cache_uninit(>flow_cache);
>  pmd_free_static_tx_qid(pmd);
>
>  if (!exiting) {
>  goto reload;
>  }
>
> +emc_cache_uninit(>flow_cache);
>  free(poll_list);
>  pmd_free_cached_ports(pmd);
>  return NULL;
> --
> 2.7.4
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.

2017-03-09 Thread Daniele Di Proietto
2017-02-21 6:49 GMT-08:00 Ilya Maximets :
> Reconfiguration of HW NICs may lead to packet drops.
> In current model all physical ports will be reconfigured each
> time number of PMD threads changed. Since we not stopping
> threads on pmd-cpu-mask changes, this patch will help to further
> decrease port's downtime by setting the maximum possible number
> of wanted tx queues to avoid unnecessary reconfigurations.
>
> Signed-off-by: Ilya Maximets 

I haven't thought this through a lot, but the last big series we pushed
on master went exactly in the opposite direction, i.e. created one txq
for each thread in the datapath.

I thought this was a good idea because:

* On some systems with hyperthreading we can have a lot of cpus (we received
   reports of systems with 72 cores). If you want to use only a couple of cores
   you're still forced to have a lot of unused txqs, which prevent you
from having
   lockless tx.
* We thought that reconfiguring the number of pmds would not be a frequent
   operation.

Why do you want to reconfigure the threads that often?  Is it to be
able to support
more throughput quickly?  In this case I think we shouldn't use the
number of cpus,
but something else coming from the user, so that the administrator can
balance how
quickly pmd threads can be reconfigured vs how many txqs should be on
the system.
I'm not sure how to make this user friendly though.

What do you think?

Thanks,

Daniele

> ---
>  lib/dpif-netdev.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 6e575ab..e2b4f39 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -3324,7 +3324,11 @@ reconfigure_datapath(struct dp_netdev *dp)
>   * on the system and the user configuration. */
>  reconfigure_pmd_threads(dp);
>
> -wanted_txqs = cmap_count(>poll_threads);
> +/* We need 1 Tx queue for each possible cpu core. */
> +wanted_txqs = ovs_numa_get_n_cores();
> +ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
> +/* And 1 Tx queue for non-PMD threads. */
> +wanted_txqs++;
>
>  /* The number of pmd threads might have changed, or a port can be new:
>   * adjust the txqs. */
> --
> 2.7.4
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] dpif-netdev: Incremental addition/deletion of PMD threads.

2017-03-09 Thread Daniele Di Proietto
2017-02-21 6:49 GMT-08:00 Ilya Maximets :
> Currently, change of 'pmd-cpu-mask' is very heavy operation.
> It requires destroying of all the PMD threads and creating
> them back. After that, all the threads will sleep until
> ports' redistribution finished.
>
> This patch adds ability to not stop the datapath while
> adjusting number/placement of PMD threads. All not affected
> threads will forward traffic without any additional latencies.
>
> id-pool created for static tx queue ids to keep them sequential
> in a flexible way. non-PMD thread will always have
> static_tx_qid = 0 as it was before.
>
> Signed-off-by: Ilya Maximets 

Thanks for the patch

The idea looks good to me.

I'm still looking at the code, and I have one comment below

> ---
>  lib/dpif-netdev.c | 119 
> +-
>  tests/pmd.at  |   2 +-
>  2 files changed, 91 insertions(+), 30 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 30907b7..6e575ab 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -48,6 +48,7 @@
>  #include "fat-rwlock.h"
>  #include "flow.h"
>  #include "hmapx.h"
> +#include "id-pool.h"
>  #include "latch.h"
>  #include "netdev.h"
>  #include "netdev-vport.h"
> @@ -241,6 +242,9 @@ struct dp_netdev {
>
>  /* Stores all 'struct dp_netdev_pmd_thread's. */
>  struct cmap poll_threads;
> +/* id pool for per thread static_tx_qid. */
> +struct id_pool *tx_qid_pool;
> +struct ovs_mutex tx_qid_pool_mutex;
>
>  /* Protects the access of the 'struct dp_netdev_pmd_thread'
>   * instance for non-pmd thread. */
> @@ -514,7 +518,7 @@ struct dp_netdev_pmd_thread {
>  /* Queue id used by this pmd thread to send packets on all netdevs if
>   * XPS disabled for this netdev. All static_tx_qid's are unique and less
>   * than 'cmap_count(dp->poll_threads)'. */
> -const int static_tx_qid;
> +uint32_t static_tx_qid;
>
>  struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. 
> */
>  /* List of rx queues to poll. */
> @@ -594,6 +598,8 @@ static struct dp_netdev_pmd_thread 
> *dp_netdev_get_pmd(struct dp_netdev *dp,
>unsigned core_id);
>  static struct dp_netdev_pmd_thread *
>  dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos);
> +static void dp_netdev_del_pmd(struct dp_netdev *dp,
> +  struct dp_netdev_pmd_thread *pmd);
>  static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp, bool non_pmd);
>  static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
>  static void dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
> @@ -1077,10 +1083,17 @@ create_dp_netdev(const char *name, const struct 
> dpif_class *class,
>  atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
>
>  cmap_init(>poll_threads);
> +
> +ovs_mutex_init(>tx_qid_pool_mutex);
> +/* We need 1 Tx queue for each possible cpu core + 1 for non-PMD 
> threads. */
> +dp->tx_qid_pool = id_pool_create(0, ovs_numa_get_n_cores() + 1);
> +
>  ovs_mutex_init_recursive(>non_pmd_mutex);
>  ovsthread_key_create(>per_pmd_key, NULL);
>
>  ovs_mutex_lock(>port_mutex);
> +/* non-PMD will be created before all other threads and will
> + * allocate static_tx_qid = 0. */
>  dp_netdev_set_nonpmd(dp);
>
>  error = do_add_port(dp, name, dpif_netdev_port_open_type(dp->class,
> @@ -1164,6 +1177,9 @@ dp_netdev_free(struct dp_netdev *dp)
>  dp_netdev_destroy_all_pmds(dp, true);
>  cmap_destroy(>poll_threads);
>
> +ovs_mutex_destroy(>tx_qid_pool_mutex);
> +id_pool_destroy(dp->tx_qid_pool);
> +
>  ovs_mutex_destroy(>non_pmd_mutex);
>  ovsthread_key_delete(dp->per_pmd_key);
>
> @@ -3175,7 +3191,10 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
>  {
>  struct dp_netdev_pmd_thread *pmd;
>  struct ovs_numa_dump *pmd_cores;
> -bool changed = false;
> +struct ovs_numa_info_core *core;
> +struct hmapx to_delete = HMAPX_INITIALIZER(_delete);
> +struct hmapx_node *node;
> +int created = 0, deleted = 0;
>
>  /* The pmd threads should be started only if there's a pmd port in the
>   * datapath.  If the user didn't provide any "pmd-cpu-mask", we start
> @@ -3188,45 +3207,62 @@ reconfigure_pmd_threads(struct dp_netdev *dp)
>  pmd_cores = ovs_numa_dump_n_cores_per_numa(NR_PMD_THREADS);
>  }
>
> -/* Check for changed configuration */
> -if (ovs_numa_dump_count(pmd_cores) != cmap_count(>poll_threads) - 1) 
> {
> -changed = true;
> -} else {
> -CMAP_FOR_EACH (pmd, node, >poll_threads) {
> -if (pmd->core_id != NON_PMD_CORE_ID
> -&& !ovs_numa_dump_contains_core(pmd_cores,
> -pmd->numa_id,
> -pmd->core_id)) {
> -

Re: [ovs-dev] [PATCH] docs: Use DPDK 16.11.1 stable release.

2017-03-09 Thread Daniele Di Proietto
2017-03-09 13:15 GMT-08:00 Ian Stokes :
> DPDK now provides a stable release branch. Modify dpdk docs and travis linux
> build script to use the DPDK 16.11.1 stable branch to benefit from most
> recent bug fixes.
>
> Signed-off-by: Ian Stokes 

Thanks for the patch, it looks good to me.

This is for master and branch-2.7, right?

Just one comment, this appears to break the travis build:

https://travis-ci.org/ddiproietto/ovs/jobs/209586728

I guess we need to update the --with-dpdk argument in .travis/linux-build.sh

> ---
>  .travis/linux-build.sh   |   10 +-
>  Documentation/faq/releases.rst   |   10 +-
>  Documentation/intro/install/dpdk.rst |6 +++---
>  Documentation/topics/dpdk/vhost-user.rst |8 
>  4 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/.travis/linux-build.sh b/.travis/linux-build.sh
> index 4175d72..06c8422 100755
> --- a/.travis/linux-build.sh
> +++ b/.travis/linux-build.sh
> @@ -52,13 +52,13 @@ function install_kernel()
>  function install_dpdk()
>  {
>  if [ -n "$DPDK_GIT" ]; then
> -git clone $DPDK_GIT dpdk-$1
> -cd dpdk-$1
> -git checkout v$1
> +git clone $DPDK_GIT dpdk-stable-$1
> +cd dpdk-stable-$1
> +git checkout tags/v$1
>  else
>  wget http://fast.dpdk.org/rel/dpdk-$1.tar.gz
>  tar xzvf dpdk-$1.tar.gz > /dev/null
> -cd dpdk-$1
> +cd dpdk-stable-$1
>  fi
>  find ./ -type f | xargs sed -i 
> 's/max-inline-insns-single=100/max-inline-insns-single=400/'
>  echo 'CONFIG_RTE_BUILD_FPIC=y' >>config/common_linuxapp
> @@ -80,7 +80,7 @@ fi
>
>  if [ "$DPDK" ]; then
>  if [ -z "$DPDK_VER" ]; then
> -DPDK_VER="16.11"
> +DPDK_VER="16.11.1"
>  fi
>  install_dpdk $DPDK_VER
>  if [ "$CC" = "clang" ]; then
> diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
> index 118c88d..98f5636 100644
> --- a/Documentation/faq/releases.rst
> +++ b/Documentation/faq/releases.rst
> @@ -152,16 +152,16 @@ Q: What DPDK version does each Open vSwitch release 
> work with?
>  A: The following table lists the DPDK version against which the given
>  versions of Open vSwitch will successfully build.
>
> - =
> + ===
>  Open vSwitch DPDK
> - =
> + ===
>  2.2.x1.6
>  2.3.x1.6
>  2.4.x2.0
>  2.5.x2.2
> -2.6.x16.07
> -2.7.x16.11
> - =
> +2.6.x16.07.2
> +2.7.x16.11.1
> + ===
>
>  Q: I get an error like this when I configure Open vSwitch::
>
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 3018590..b947bd5 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -64,9 +64,9 @@ Install DPDK
>  #. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
>
> $ cd /usr/src/
> -   $ wget http://fast.dpdk.org/rel/dpdk-16.11.tar.xz
> -   $ tar xf dpdk-16.11.tar.xz
> -   $ export DPDK_DIR=/usr/src/dpdk-16.11
> +   $ wget http://fast.dpdk.org/rel/dpdk-16.11.1.tar.xz
> +   $ tar xf dpdk-16.11.1.tar.xz
> +   $ export DPDK_DIR=/usr/src/dpdk-stable-16.11.1
> $ cd $DPDK_DIR
>
>  #. (Optional) Configure DPDK as a shared library
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index 5448bd2..ba22684 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -278,9 +278,9 @@ To begin, instantiate a guest as described in 
> :ref:`dpdk-vhost-user` or
>  DPDK sources to VM and build DPDK::
>
>  $ cd /root/dpdk/
> -$ wget http://fast.dpdk.org/rel/dpdk-16.11.tar.xz
> -$ tar xf dpdk-16.11.tar.xz
> -$ export DPDK_DIR=/root/dpdk/dpdk-16.11
> +$ wget http://fast.dpdk.org/rel/dpdk-16.11.1.tar.xz
> +$ tar xf dpdk-16.11.1.tar.xz
> +$ export DPDK_DIR=/root/dpdk/dpdk-stable-16.11.1
>  $ export DPDK_TARGET=x86_64-native-linuxapp-gcc
>  $ export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
>  $ cd $DPDK_DIR
> @@ -364,7 +364,7 @@ Sample XML
>  
>  
>
> -  
> +  
>
>
>  
> --
> 1.7.0.7
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpdk: Redirect DPDK log to OVS logging subsystem.

2017-03-09 Thread Daniele Di Proietto
2017-03-06 11:28 GMT-08:00 Aaron Conole :
> Ilya Maximets  writes:
>
>> This should be helpful for have all the logs in one place.
>> 'ovs-appctl vlog' commands for 'dpdk' module can be used
>> to configure the log level. Lower bound for DPDK logging
>> (--log-level) still can be passed through 'dpdk-extra' field.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>
> Worked fine for me so far.  I'm going to keep running with this.
>
> Acked-by: Aaron Conole 
>
> -Aaron

Thanks a lot, I think this is very useful

Applied to master
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dpdk: Fix mempool segfault.

2017-03-09 Thread Daniele Di Proietto
2017-03-09 5:57 GMT-08:00 Ian Stokes <ian.sto...@intel.com>:
> The dpdk_mp_get() function can return a NULL pointer which leads to a
> segfault when a mempool cannot be created. The lack of a return value
> check for the function netdev_dpdk_mempool_configure() when called in
> netdev_dpdk_reconfigure() can result in a segfault also as
> a NULL pointer for the mempool will be passed to rte_eth_rx_queue_setup().
>
> Fix this by adding appropriate NULL pointer and return value checks to
> dpdk_mp_get(), netdev_dpdk_reconfigure() and dpdk_vhost_reconfigure_helper().
>
> Signed-off-by: Ian Stokes <ian.sto...@intel.com>
> Fixes: 2ae3d542 ("netdev-dpdk: Refactor dpdk_mp_get().")
> Fixes: 0072e931 ("netdev-dpdk: add support for jumbo frames")
> CC: Daniele Di Proietto <diproiet...@vmware.com>
> CC: Mark Kavanagh <mark.b.kavan...@intel.com>
>

Thanks, applied to master and branch-2.7

> ---
> v3
> * Remove assignments within if conditions for
>   netdev_dpdk_reconfigure(), netdev_vhost_reconfigure_helper() and
>   dpdk_mp_get()
>
> v2
> * Remove extra VLOG_ERR in netdev_dpdk_reconfigure()
> * Remove extra VLOG_ERR in netdev_vhost_reconfigure_helper()
> * Remove check for NULL mempool in netdev_vhost_reconfigure_helper() as
>   netdev_dpdk_mempool_configure() already checks and returns ENOMEM error
>   for this case.
>
> v1
> * Add NULL pointer check to dpdk_mp_get() when calling dpdk_mp_create().
> * Add return type check when calling netdev_dpdk_mempool_configure() in
>   netdev_dpdk_reconfigure().
> * Add return type check when calling netdev_dpdk_mempool_configure() in
>   netdev_vhost_reconfigure_helper()
> ---
>  lib/netdev-dpdk.c |   20 +---
>  1 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index ee53c4c..67905c4 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -530,7 +530,9 @@ dpdk_mp_get(int socket_id, int mtu)
>  }
>
>  dmp = dpdk_mp_create(socket_id, mtu);
> -ovs_list_push_back(_mp_list, >list_node);
> +if (dmp) {
> +ovs_list_push_back(_mp_list, >list_node);
> +}
>
>  out:
>  ovs_mutex_unlock(_mp_mutex);
> @@ -3131,7 +3133,10 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>
>  if (dev->mtu != dev->requested_mtu
>  || dev->socket_id != dev->requested_socket_id) {
> -netdev_dpdk_mempool_configure(dev);
> +err = netdev_dpdk_mempool_configure(dev);
> +if (err) {
> +goto out;
> +}
>  }
>
>  netdev->n_txq = dev->requested_n_txq;
> @@ -3160,6 +3165,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>  {
>  dev->up.n_txq = dev->requested_n_txq;
>  dev->up.n_rxq = dev->requested_n_rxq;
> +int err;
>
>  /* Enable TX queue 0 by default if it wasn't disabled. */
>  if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) {
> @@ -3170,15 +3176,15 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
>
>  if (dev->requested_socket_id != dev->socket_id
>  || dev->requested_mtu != dev->mtu) {
> -if (!netdev_dpdk_mempool_configure(dev)) {
> +err = netdev_dpdk_mempool_configure(dev);
> +if (err) {
> +return err;
> +}
> +else {
>  netdev_change_seq_changed(>up);
>  }
>  }
>
> -if (!dev->dpdk_mp) {
> -return ENOMEM;
> -}
> -
>  if (netdev_dpdk_get_vid(dev) >= 0) {
>  dev->vhost_reconfigured = true;
>  }
> --
> 1.7.0.7
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v6 6/8] dpdk: Add missing CHECK_CONNTRACK_ALG guards.

2017-03-09 Thread Daniele Di Proietto
2017-02-16 0:47 GMT-08:00 Darrell Ball :
> Signed-off-by: Darrell Ball 
> Acked-by: Flavio Leitner 

Thanks, I applied this to master

> ---
>  tests/system-traffic.at | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/tests/system-traffic.at b/tests/system-traffic.at
> index a15e059..e97a45d 100644
> --- a/tests/system-traffic.at
> +++ b/tests/system-traffic.at
> @@ -2601,6 +2601,7 @@ m4_define([CHECK_FTP_NAT],
>  AT_SKIP_IF([test $HAVE_FTP = no])
>  CHECK_CONNTRACK()
>  CHECK_CONNTRACK_NAT()
> +CHECK_CONNTRACK_ALG()
>
>  OVS_TRAFFIC_VSWITCHD_START()
>
> @@ -2815,6 +2816,8 @@ AT_SETUP([conntrack - IPv6 FTP with NAT])
>  AT_SKIP_IF([test $HAVE_FTP = no])
>  CHECK_CONNTRACK()
>  CHECK_CONNTRACK_NAT()
> +CHECK_CONNTRACK_ALG()
> +
>  OVS_TRAFFIC_VSWITCHD_START()
>
>  ADD_NAMESPACES(at_ns0, at_ns1)
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v6 4/8] dpdk: Userspace Datapath: Introduce NAT Support.

2017-03-09 Thread Daniele Di Proietto
2017-03-09 10:21 GMT-08:00 Darrell Ball <db...@vmware.com>:
>
>
> On 3/8/17, 6:14 PM, "ovs-dev-boun...@openvswitch.org on behalf of Daniele Di 
> Proietto" <ovs-dev-boun...@openvswitch.org on behalf of diproiet...@ovn.org> 
> wrote:
>
> 2017-02-16 0:47 GMT-08:00 Darrell Ball <dlu...@gmail.com>:
> > This patch introduces NAT support for the userspace datapath.
> > The conntrack module changes are in this patch.
> >
> > The per packet scope of lookups for NAT and un_NAT is at
> > the bucket level rather than global. One hash table is
> > introduced to support create/delete handling. The create/delete
> > events may be further optimized, if the need becomes clear.
> >
> > Some NAT options with limited utility (persistent, random) are
> > not supported yet, but will be supported in a later patch.
> >
> > Signed-off-by: Darrell Ball <dlu...@gmail.com>
>
> Thanks for the patch,  I'll keep looking at this, but since you're
> about to send another version I had one comment below.
>
> > ---
> >  lib/conntrack-private.h |  16 +-
> >  lib/conntrack.c | 782 
> ++--
> >  lib/conntrack.h |  46 +++
> >  3 files changed, 751 insertions(+), 93 deletions(-)
> >
> > diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> > index 493865f..a7c2ae4 100644
> > --- a/lib/conntrack-private.h
> > +++ b/lib/conntrack-private.h
> > @@ -51,14 +51,23 @@ struct conn_key {
> >  uint16_t zone;
> >  };
> >
> > +struct nat_conn_key_node {
> > +struct hmap_node node;
> > +struct conn_key key;
> > +struct conn_key value;
> > +};
> > +
> >  struct conn {
> >  struct conn_key key;
> >  struct conn_key rev_key;
> >  long long expiration;
> >  struct ovs_list exp_node;
> >  struct hmap_node node;
> > -uint32_t mark;
> >  ovs_u128 label;
> > +/* XXX: consider flattening. */
> > +struct nat_action_info_t *nat_info;
> > +uint32_t mark;
> > +uint8_t conn_type;
> >  };
> >
> >  enum ct_update_res {
> > @@ -67,6 +76,11 @@ enum ct_update_res {
> >  CT_UPDATE_NEW,
> >  };
> >
> > +enum ct_conn_type {
> > +CT_CONN_TYPE_DEFAULT,
> > +CT_CONN_TYPE_UN_NAT,
> > +};
> > +
> >  struct ct_l4_proto {
> >  struct conn *(*new_conn)(struct conntrack_bucket *, struct 
> dp_packet *pkt,
> >   long long now);
> > diff --git a/lib/conntrack.c b/lib/conntrack.c
> > index d0e106f..49760c0 100644
> > --- a/lib/conntrack.c
> > +++ b/lib/conntrack.c
> > @@ -76,6 +76,20 @@ static void set_label(struct dp_packet *, struct 
> conn *,
> >const struct ovs_key_ct_labels *mask);
> >  static void *clean_thread_main(void *f_);
> >
> > +static struct nat_conn_key_node *
> > +nat_conn_keys_lookup(struct hmap *nat_conn_keys,
> > + const struct conn_key *key,
> > + uint32_t basis);
> > +
> > +static void
> > +nat_conn_keys_remove(struct hmap *nat_conn_keys,
> > +const struct conn_key *key,
> > +uint32_t basis);
> > +
> > +static bool
> > +nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
> > +   struct conn *nat_conn);
> > +
> >  static struct ct_l4_proto *l4_protos[] = {
> >  [IPPROTO_TCP] = _proto_tcp,
> >  [IPPROTO_UDP] = _proto_other,
> > @@ -90,7 +104,7 @@ long long ct_timeout_val[] = {
> >  };
> >
> >  /* If the total number of connections goes above this value, no new 
> connections
> > - * are accepted */
> > + * are accepted; this is for CT_CONN_TYPE_DEFAULT connections. */
> >  #define DEFAULT_N_CONN_LIMIT 300
> >
> >  /* Initializes the connection tracker 'ct'.  The caller is responsible 
> for
> > @@ -101,6 +115,11 @@ conntrack_init(struct conntrack *ct)
> >  unsigned i, j;
> >  long long now = time_msec();
> >
> > +ct_rwlock_init(>nat_resources_

Re: [ovs-dev] [patch_v6 3/8] dpdk: Remove batch sorting in userspace conntrack.

2017-03-08 Thread Daniele Di Proietto
Thanks for posting this as a separate patch, it makes the review easier.

The idea and the patch look good to me.

One comment:

with this code we don't need the ctxs array, we can just have a single
ctx in the for loop.

Other than that:

Acked-by: Daniele Di Proietto <diproiet...@vmware.com>

2017-02-21 9:36 GMT-08:00 Flavio Leitner <f...@sysclose.org>:
> On Thu, Feb 16, 2017 at 12:47:34AM -0800, Darrell Ball wrote:
>> Packet batch sorting is removed for three reasons:
>>
>> 1) The following patches for NAT change the locking
>> marshalling so batching loses benefit.
>>
>> 2) For real mixtures of flows either in hypervisors
>>or gateways, the batch sorting won't provide benefit
>>and will just be a tax.
>>
>> 3) Code clarity.
>>
>> Signed-off-by: Darrell Ball <dlu...@gmail.com>
>> ---
>
> I can't tell about the real performance impact but I'd agree
> with the second point.
>
> Acked-by: Flavio Leitner <f...@sysclose.org>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 4/7] ofproto: New cookie-counters module.

2017-03-08 Thread Daniele Di Proietto





On 07/03/2017 10:35, "Ben Pfaff" <b...@ovn.org> wrote:

>On Fri, Feb 24, 2017 at 06:57:58PM -0800, Daniele Di Proietto wrote:
>> The new module will be used by ofproto to keep track of the number of
>> learned flows with the same cookie in the same table.
>> 
>> The counters are used to implement limits for the learn action.
>> 
>> The module implements its own internal locking, because the counters can
>> be increased/decreased from handlers and revalidators.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>There's an existing hindex of flows by their cookies, the 'cookies'
>member of struct ofproto, and even a 'learned_cookies' hmap.  Is there a
>graceful way to avoid having more indexes of flows by cookie?

Good point, it looks redundant.

This module was necessary due to the extra locking requirements.  The
other map were only protected by ofproto_mutex, while this had its own
mutex.  Due to the different design in v2 this won't be necessary at all.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 6/7] ofproto: Implement learning limit.

2017-03-08 Thread Daniele Di Proietto





On 07/03/2017 10:39, "Ben Pfaff" <b...@ovn.org> wrote:

>On Fri, Feb 24, 2017 at 06:58:00PM -0800, Daniele Di Proietto wrote:
>> With this commit we honor the newly introduced limit to the learn
>> action.
>> 
>> When learning a new rule (with the limit set), the rule will hold a
>> reference to a counter.  The ukey that learned the rule will also keep
>> the same reference,  so the counter will be decremented when both the
>> ukey and the original rule have been deleted.
>> 
>> This means that there's a small window between the learn flow expiry and
>> the next revalidation in which new flows are not learned because OVS
>> thinks that we would exceed the limit.  I think this is better that the
>> alternative, because if we allow learning in that interval we're not
>> strictly enforcing the limit, because we still have the datapath flows
>> that are passing traffic for expired rules.
>> 
>> There's a small corner case when we have to slowpath the ukey.  This
>> happens when:
>> * The learned rule has expired (or has been deleted).
>> * The ukey that learned the rule is still in the datapath.
>> * No packets hit the datapath flow recently.
>> In this case we cannot relearn the rule (because there are no new
>> packets), and the actions might depend on the learn execution, so the
>> only option is to slowpath the ukey.  I don't think this has big
>> performance implications since it's done only for ukey with no traffic.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>You mentioned face-to-face that a change I suggested earlier in the
>series would simplify this one.  If it's OK, then I'll wait to review
>this patch and patch 7 until you've managed to do the simplification.
>
>Thanks,
>
>Ben.

Thanks a lot for looking at this Ben,

I've sent a v2 here:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-March/329524.html
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 3/3] ofp-actions: Add limit to learn action.

2017-03-08 Thread Daniele Di Proietto
This commit adds a new feature to the learn actions: the possibility to
limit the number of learned flows.

To be compatible with users of the old learn action, a new structure is
introduced as well as a new OpenFlow raw action number.

There's a small corner case when we have to slowpath the ukey.  This
happens when:
* The learned rule has expired (or has been deleted).
* The ukey that learned the rule is still in the datapath.
* No packets hit the datapath flow recently.
In this case we cannot relearn the rule (because there are no new
packets), and the actions might depend on the learn execution, so the
only option is to slowpath the ukey.  I don't think this has big
performance implications since it's done only for ukey with no traffic.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 include/openvswitch/ofp-actions.h  |  12 +++
 lib/learn.c|  24 +
 lib/odp-util.h |   6 +-
 lib/ofp-actions.c  |  88 -
 ofproto/ofproto-dpif-xlate-cache.c |   3 +-
 ofproto/ofproto-dpif-xlate-cache.h |   1 +
 ofproto/ofproto-dpif-xlate.c   |  22 -
 ofproto/ofproto-provider.h |   3 +-
 ofproto/ofproto.c  |  46 +++--
 tests/learn.at | 191 +
 tests/ofp-actions.at   |  14 +++
 utilities/ovs-ofctl.8.in   |  16 
 12 files changed, 412 insertions(+), 14 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 88f573dcd..63abbb75b 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -652,6 +652,10 @@ struct ofpact_resubmit {
  * If NX_LEARN_F_SEND_FLOW_REM is set, then the learned flows will have their
  * OFPFF_SEND_FLOW_REM flag set.
  *
+ * If NX_LEARN_F_WRITE_RESULT is set, then the actions will write whether the
+ * learn operation succeded on a bit.  If the learn is successful the bit will
+ * be set, otherwise (e.g. if the limit is hit) the bit will be unset.
+ *
  * If NX_LEARN_F_DELETE_LEARNED is set, then removing this action will delete
  * all the flows from the learn action's 'table_id' that have the learn
  * action's 'cookie'.  Important points:
@@ -677,6 +681,7 @@ struct ofpact_resubmit {
 enum nx_learn_flags {
 NX_LEARN_F_SEND_FLOW_REM = 1 << 0,
 NX_LEARN_F_DELETE_LEARNED = 1 << 1,
+NX_LEARN_F_WRITE_RESULT = 1 << 2,
 };
 
 #define NX_LEARN_N_BITS_MASK0x3ff
@@ -740,6 +745,13 @@ struct ofpact_learn {
 ovs_be64 cookie;   /* Cookie for new flow. */
 uint16_t fin_idle_timeout; /* Idle timeout after FIN, if nonzero. */
 uint16_t fin_hard_timeout; /* Hard timeout after FIN, if nonzero. */
+/* If the number of flows on 'table_id' with 'cookie' exceeds this,
+ * the action will not add a new flow. */
+uint32_t limit;
+/* Used only if 'flags' has NX_LEARN_F_WRITE_RESULT.  If the execution
+ * failed to install a new flow because 'limit' is exceeded,
+ * result_dst will be set to 0, otherwise to 1. */
+struct mf_subfield result_dst;
 );
 
 struct ofpact_learn_spec specs[];
diff --git a/lib/learn.c b/lib/learn.c
index ce52c35f2..50a478fe1 100644
--- a/lib/learn.c
+++ b/lib/learn.c
@@ -406,6 +406,22 @@ learn_parse__(char *orig, char *arg, struct ofpbuf 
*ofpacts)
 learn->flags |= NX_LEARN_F_SEND_FLOW_REM;
 } else if (!strcmp(name, "delete_learned")) {
 learn->flags |= NX_LEARN_F_DELETE_LEARNED;
+} else if (!strcmp(name, "limit")) {
+learn->limit = atoi(value);
+} else if (!strcmp(name, "result_dst")) {
+char *error;
+learn->flags |= NX_LEARN_F_WRITE_RESULT;
+error = mf_parse_subfield(>result_dst, value);
+if (error) {
+return error;
+}
+if (!learn->result_dst.field->writable) {
+return xasprintf("%s is read-only", value);
+}
+if (learn->result_dst.n_bits != 1) {
+return xasprintf("result_dst in 'learn' action must be a "
+ "single bit");
+}
 } else {
 struct ofpact_learn_spec *spec;
 char *error;
@@ -487,6 +503,14 @@ learn_format(const struct ofpact_learn *learn, struct ds 
*s)
 ds_put_format(s, ",%scookie=%s%#"PRIx64,
   colors.param, colors.end, ntohll(learn->cookie));
 }
+if (learn->limit != 0) {
+ds_put_format(s, ",%slimit=%s%"PRIu32,
+  colors.param, colors.end, learn->limit);
+}
+if (learn->flags & NX_LEARN_F_WRITE_RESULT) {
+ds_put_format(s, ",%sresult_dst=%s", colors.param, colors.end);
+mf_format_su

[ovs-dev] [PATCH v2 2/3] ofp-actions: Factor out decode_LEARN_{common, spec}().

2017-03-08 Thread Daniele Di Proietto
No functional change, they will be used by next commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ofp-actions.c | 58 ++-
 1 file changed, 40 insertions(+), 18 deletions(-)

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index ce80f57e8..91a7a1ade 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4313,23 +4313,14 @@ learn_min_len(uint16_t header)
 return min_len;
 }
 
-/* Converts 'nal' into a "struct ofpact_learn" and appends that struct to
- * 'ofpacts'.  Returns 0 if successful, otherwise an OFPERR_*. */
 static enum ofperr
-decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
-   enum ofp_version ofp_version OVS_UNUSED,
-   const struct vl_mff_map *vl_mff_map,
-   struct ofpbuf *ofpacts)
+decode_LEARN_common(const struct nx_action_learn *nal,
+struct ofpact_learn *learn)
 {
-struct ofpact_learn *learn;
-const void *p, *end;
-
 if (nal->pad) {
 return OFPERR_OFPBAC_BAD_ARGUMENT;
 }
 
-learn = ofpact_put_LEARN(ofpacts);
-
 learn->idle_timeout = ntohs(nal->idle_timeout);
 learn->hard_timeout = ntohs(nal->hard_timeout);
 learn->priority = ntohs(nal->priority);
@@ -4337,19 +4328,23 @@ decode_NXAST_RAW_LEARN(const struct nx_action_learn 
*nal,
 learn->table_id = nal->table_id;
 learn->fin_idle_timeout = ntohs(nal->fin_idle_timeout);
 learn->fin_hard_timeout = ntohs(nal->fin_hard_timeout);
-
 learn->flags = ntohs(nal->flags);
-if (learn->flags & ~(NX_LEARN_F_SEND_FLOW_REM |
- NX_LEARN_F_DELETE_LEARNED)) {
-return OFPERR_OFPBAC_BAD_ARGUMENT;
-}
 
 if (learn->table_id == 0xff) {
 return OFPERR_OFPBAC_BAD_ARGUMENT;
 }
 
-end = (char *) nal + ntohs(nal->len);
-for (p = nal + 1; p != end; ) {
+return 0;
+}
+
+static enum ofperr
+decode_LEARN_specs(const void *p, const void *end,
+   const struct vl_mff_map *vl_mff_map,
+   struct ofpbuf *ofpacts)
+{
+struct ofpact_learn *learn = ofpacts->header;
+
+while (p != end) {
 struct ofpact_learn_spec *spec;
 uint16_t header = ntohs(get_be16());
 
@@ -4422,6 +4417,33 @@ decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
 return 0;
 }
 
+/* Converts 'nal' into a "struct ofpact_learn" and appends that struct to
+ * 'ofpacts'.  Returns 0 if successful, otherwise an OFPERR_*. */
+static enum ofperr
+decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
+   enum ofp_version ofp_version OVS_UNUSED,
+   const struct vl_mff_map *vl_mff_map,
+   struct ofpbuf *ofpacts)
+{
+struct ofpact_learn *learn;
+enum ofperr error;
+
+learn = ofpact_put_LEARN(ofpacts);
+
+error = decode_LEARN_common(nal, learn);
+if (error) {
+return error;
+}
+
+if (learn->flags & ~(NX_LEARN_F_SEND_FLOW_REM |
+ NX_LEARN_F_DELETE_LEARNED)) {
+return OFPERR_OFPBAC_BAD_ARGUMENT;
+}
+
+return decode_LEARN_specs(nal + 1, (char *) nal + ntohs(nal->len),
+  vl_mff_map, ofpacts);
+}
+
 static void
 put_be16(struct ofpbuf *b, ovs_be16 x)
 {
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 1/3] ofproto-dpif-xlate: Create XC_LEARN entry after learning.

2017-03-08 Thread Daniele Di Proietto
This will be useful in a separate commit, because learning can fail.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
Acked-by: Ben Pfaff <b...@ovn.org>
---
 ofproto/ofproto-dpif-xlate.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index eda34f044..0912ee38c 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4502,11 +4502,7 @@ xlate_learn_action(struct xlate_ctx *ctx, const struct 
ofpact_learn *learn)
 enum ofperr error;
 
 if (ctx->xin->xcache) {
-struct xc_entry *entry;
-
-entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN);
-entry->learn.ofm = xmalloc(sizeof *entry->learn.ofm);
-ofm = entry->learn.ofm;
+ofm = xmalloc(sizeof *ofm);
 } else {
 ofm = __;
 }
@@ -4540,8 +4536,22 @@ xlate_learn_action(struct xlate_ctx *ctx, const struct 
ofpact_learn *learn)
  , ofm);
 ofpbuf_uninit();
 
-if (!error && ctx->xin->allow_side_effects) {
-error = ofproto_flow_mod_learn(ofm, ctx->xin->xcache != NULL);
+if (!error) {
+if (ctx->xin->allow_side_effects) {
+error = ofproto_flow_mod_learn(ofm, ctx->xin->xcache != NULL);
+}
+
+if (ctx->xin->xcache) {
+struct xc_entry *entry;
+
+entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN);
+entry->learn.ofm = ofm;
+ofm = NULL;
+}
+}
+
+if (ctx->xin->xcache) {
+free(ofm);
 }
 
 if (error) {
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2 0/3] Learn action limit

2017-03-08 Thread Daniele Di Proietto
This series implements the possibility to have a limit on the number of
flows learned by a learn action.  After the learn action execution the
pipeline can read the result (to know if the limit was exceeded).

v1->v2:
* 'limit' counts both learn flows and flows installed by controller,
  suggested by Ben.
* Don't keep a reference to the counter in ukeys
* Squash tests, openflow interface changes and ofproto implementation
  into a single commit
* The new cookie-counters module is not used anymore, therefore it's removed
* Fix memory leak in ofproto_flow_mod_learn(): we have to call
  ofproto_flow_mod_uninit() if we don't call ofproto_flow_mod_learn_start().
* Simplify ofp-actions changes according to Ben comments(thanks!)


Daniele Di Proietto (3):
  ofproto-dpif-xlate: Create XC_LEARN entry after learning.
  ofp-actions: Factor out decode_LEARN_{common,spec}().
  ofp-actions: Add limit to learn action.

 include/openvswitch/ofp-actions.h  |  12 +++
 lib/learn.c|  24 +
 lib/odp-util.h |   6 +-
 lib/ofp-actions.c  | 146 
 ofproto/ofproto-dpif-xlate-cache.c |   3 +-
 ofproto/ofproto-dpif-xlate-cache.h |   1 +
 ofproto/ofproto-dpif-xlate.c   |  42 ++--
 ofproto/ofproto-provider.h |   3 +-
 ofproto/ofproto.c  |  46 +++--
 tests/learn.at | 191 +
 tests/ofp-actions.at   |  14 +++
 utilities/ovs-ofctl.8.in   |  16 
 12 files changed, 467 insertions(+), 37 deletions(-)

-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] conntrack: Fix checks for TCP, UDP, and IPv6 header sizes.

2017-03-06 Thread Daniele Di Proietto
2017-03-03 21:18 GMT-08:00 Ben Pfaff <b...@ovn.org>:
> Otherwise a malformed packet could cause a read up to about 40 bytes past
> the end of the packet.  The packet would still likely be dropped because
> of checksum verification.
>
> Reported-by: Bhargava Shastry <bshas...@sec.t-labs.tu-berlin.de>
> Signed-off-by: Ben Pfaff <b...@ovn.org>

Acked-by: Daniele Di Proietto <diproiet...@vmware.com>

> ---
> v1->v2: Eliminate duplicate check in extract_l3_ipv6().  Thanks Daniele!
>
>  lib/conntrack.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 9bea3d93e4ad..677c0d2a3cdc 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -568,15 +568,15 @@ extract_l3_ipv6(struct conn_key *key, const void *data, 
> size_t size,
>  const char **new_data)
>  {
>  const struct ovs_16aligned_ip6_hdr *ip6 = data;
> -uint8_t nw_proto = ip6->ip6_nxt;
> -uint8_t nw_frag = 0;
> -
>  if (new_data) {
>  if (OVS_UNLIKELY(size < sizeof *ip6)) {
>  return false;
>  }
>  }
>
> +uint8_t nw_proto = ip6->ip6_nxt;
> +uint8_t nw_frag = 0;
> +
>  data = ip6 + 1;
>  size -=  sizeof *ip6;
>
> @@ -623,8 +623,11 @@ check_l4_tcp(const struct conn_key *key, const void 
> *data, size_t size,
>   const void *l3)
>  {
>  const struct tcp_header *tcp = data;
> -size_t tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
> +if (size < sizeof *tcp) {
> +return false;
> +}
>
> +size_t tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
>  if (OVS_UNLIKELY(tcp_len < TCP_HEADER_LEN || tcp_len > size)) {
>  return false;
>  }
> @@ -637,8 +640,11 @@ check_l4_udp(const struct conn_key *key, const void 
> *data, size_t size,
>   const void *l3)
>  {
>  const struct udp_header *udp = data;
> -size_t udp_len = ntohs(udp->udp_len);
> +if (size < sizeof *udp) {
> +return false;
> +}
>
> +size_t udp_len = ntohs(udp->udp_len);
>  if (OVS_UNLIKELY(udp_len < UDP_HEADER_LEN || udp_len > size)) {
>  return false;
>  }
> --
> 2.10.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] conntrack: Fix checks for TCP, UDP, and IPv6 header sizes.

2017-03-03 Thread Daniele Di Proietto
2017-03-03 14:08 GMT-08:00 Ben Pfaff <b...@ovn.org>:
> Otherwise a malformed packet could cause a read up to about 40 bytes past
> the end of the packet.  The packet would still likely be dropped because
> of checksum verification.
>
> Reported-by: Bhargava Shastry <bshas...@sec.t-labs.tu-berlin.de>
> Signed-off-by: Ben Pfaff <b...@ovn.org>

Oops, thanks for the fix, Ben

Fixes: a489b16854b5("conntrack: New userspace connection tracker.")

One minor comment below,

Acked-by: Daniele Di Proietto <diproiet...@vmware.com>


> ---
>  lib/conntrack.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 9bea3d93e4ad..9c1dd63648b8 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -568,6 +568,10 @@ extract_l3_ipv6(struct conn_key *key, const void *data, 
> size_t size,
>  const char **new_data)
>  {
>  const struct ovs_16aligned_ip6_hdr *ip6 = data;
> +if (size < sizeof *ip6) {
> +return false;
> +}
> +

We can read 'ip6->ip6_nxt' even though there's not enough data.  It
cannot happen
for regular TCP and UDP packets (those are covered my
miniflow_extract), but only
when parsing the nested l3 header in an ICMP error message.

The code has the same check two lines below, maybe we can reuse that.
Technically
the check is necessary only if new_data != NULL, as explained by the comment
above, but perhaps it's more clear to always perform it.


>  uint8_t nw_proto = ip6->ip6_nxt;
>  uint8_t nw_frag = 0;
>
> @@ -623,8 +627,11 @@ check_l4_tcp(const struct conn_key *key, const void 
> *data, size_t size,
>   const void *l3)
>  {
>  const struct tcp_header *tcp = data;
> -size_t tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
> +if (size < sizeof *tcp) {
> +return false;
> +}
>
> +size_t tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
>  if (OVS_UNLIKELY(tcp_len < TCP_HEADER_LEN || tcp_len > size)) {
>  return false;
>  }
> @@ -637,8 +644,11 @@ check_l4_udp(const struct conn_key *key, const void 
> *data, size_t size,
>   const void *l3)
>  {
>  const struct udp_header *udp = data;
> -size_t udp_len = ntohs(udp->udp_len);
> +if (size < sizeof *udp) {
> +return false;
> +}
>
> +size_t udp_len = ntohs(udp->udp_len);
>  if (OVS_UNLIKELY(udp_len < UDP_HEADER_LEN || udp_len > size)) {
>  return false;
>  }
> --
> 2.10.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 7/7] tests: Add learn action with limit tests.

2017-02-24 Thread Daniele Di Proietto
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 tests/learn.at | 175 +
 1 file changed, 175 insertions(+)

diff --git a/tests/learn.at b/tests/learn.at
index 3f6fb5a7e..f91a662ad 100644
--- a/tests/learn.at
+++ b/tests/learn.at
@@ -626,3 +626,178 @@ NXST_FLOW reply:
 ])
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([learning action - limit])
+OVS_VSWITCHD_START
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
+add_of_ports br0 1 2
+AT_DATA([flows.txt], [dnl
+table=0 in_port=1 actions=learn(table=1,dl_dst=dl_src,cookie=0x1,limit=1),2
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:01,dst=50:54:00:00:00:ff),eth_type(0x1234)'])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:02,dst=50:54:00:00:00:ff),eth_type(0x1234)'])
+
+OVS_WAIT_UNTIL([ovs-ofctl dump-ports br0 2 | grep -o 'tx pkts=2' >/dev/null])
+
+AT_CHECK([ovs-ofctl dump-flows br0 table=1 | ofctl_strip | sort], [0], [dnl
+ cookie=0x1, table=1, dl_dst=50:54:00:00:00:01 actions=drop
+NXST_FLOW reply:
+])
+
+dnl Delete the learned flow
+AT_CHECK([ovs-ofctl del-flows br0 table=1])
+
+AT_CHECK([ovs-ofctl dump-flows br0 table=1 | ofctl_strip | sort], [0], [dnl
+NXST_FLOW reply:
+])
+
+ovs-appctl revalidator/wait
+
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:02,dst=50:54:00:00:00:ff),eth_type(0x1234)'])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:01,dst=50:54:00:00:00:ff),eth_type(0x1234)'])
+
+OVS_WAIT_UNTIL([ovs-ofctl dump-ports br0 2 | grep -o 'tx pkts=4' >/dev/null])
+
+AT_CHECK([ovs-ofctl dump-flows br0 table=1 | ofctl_strip | sort], [0], [dnl
+ cookie=0x1, table=1, dl_dst=50:54:00:00:00:02 actions=drop
+NXST_FLOW reply:
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([learning action - limit result_dst])
+OVS_VSWITCHD_START
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
+add_of_ports br0 1
+AT_DATA([flows.txt], [dnl
+table=0 in_port=1 
actions=learn(table=1,dl_dst=dl_src,cookie=0x1,limit=1,result_dst=reg0[[0]]),controller
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl -P nxt_packet_in --detach 
--no-chdir --pidfile 2> ofctl_monitor.log])
+
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:01,dst=50:54:00:00:00:ff),eth_type(0x1234)'])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:02,dst=50:54:00:00:00:ff),eth_type(0x1234)'])
+
+OVS_WAIT_UNTIL([test `wc -l < ofctl_monitor.log` -ge 4])
+OVS_WAIT_UNTIL([ovs-appctl -t ovs-ofctl exit])
+
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=14 reg0=0x1,in_port=1 (via 
action) data_len=14 (unbuffered)
+vlan_tci=0x,dl_src=50:54:00:00:00:01,dl_dst=50:54:00:00:00:ff,dl_type=0x1234
+NXT_PACKET_IN (xid=0x0): cookie=0x0 total_len=14 in_port=1 (via action) 
data_len=14 (unbuffered)
+vlan_tci=0x,dl_src=50:54:00:00:00:02,dl_dst=50:54:00:00:00:ff,dl_type=0x1234
+])
+
+AT_CHECK([ovs-ofctl dump-flows br0 table=1 | ofctl_strip | sort], [0], [dnl
+ cookie=0x1, table=1, dl_dst=50:54:00:00:00:01 actions=drop
+NXST_FLOW reply:
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([learning action - different limits])
+OVS_VSWITCHD_START
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg])
+add_of_ports br0 1 2 3
+AT_DATA([flows.txt], [dnl
+table=0 in_port=1 
udp,actions=learn(table=11,dl_type=0x0800,nw_proto=17,udp_src=udp_dst,limit=1,result_dst=reg0[[0]]),resubmit(,1)
+table=0 in_port=2 
udp,actions=learn(table=12,dl_type=0x0800,nw_proto=17,udp_src=udp_dst,limit=10,result_dst=reg0[[0]]),resubmit(,1)
+table=0 in_port=3 
udp,actions=learn(table=13,dl_type=0x0800,nw_proto=17,udp_src=udp_dst,limit=20,result_dst=reg0[[0]]),resubmit(,1)
+dnl
+dnl These flows simply counts the packets that executed a successful learn 
action:
+dnl
+table=1 cookie=1,reg0=1,in_port=1 actions=drop
+table=1 cookie=2,reg0=1,in_port=2 actions=drop
+table=1 cookie=3,reg0=1,in_port=3 actions=drop
+dnl
+dnl These flows simply counts the packets that didn't execute a successful 
learn action:
+dnl
+table=1 cookie=1,reg0=0,in_port=1 actions=drop
+table=1 cookie=2,reg0=0,in_port=2 actions=drop
+table=1 cookie=3,reg0=0,in_port=3 actions=drop
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+for i in `seq 1001 1030`; do
+ovs-appctl netdev-dummy/receive p1 
"in_port(1),eth(src=50:54:00:00:00:01,dst=50:54:00:00:00:ff),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=17,tos=0,ttl=64,frag=no),udp(src=1,dst=$i)"
+ovs-appctl netdev-dummy/receive p2 
"in_port(2),eth(src=50:54:00:00:00:01,dst=50:54:00:00:00:ff),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=17,tos=0,ttl=64,frag=no),udp(src=1,dst=$i)"
+ovs-ap

[ovs-dev] [PATCH 6/7] ofproto: Implement learning limit.

2017-02-24 Thread Daniele Di Proietto
With this commit we honor the newly introduced limit to the learn
action.

When learning a new rule (with the limit set), the rule will hold a
reference to a counter.  The ukey that learned the rule will also keep
the same reference,  so the counter will be decremented when both the
ukey and the original rule have been deleted.

This means that there's a small window between the learn flow expiry and
the next revalidation in which new flows are not learned because OVS
thinks that we would exceed the limit.  I think this is better that the
alternative, because if we allow learning in that interval we're not
strictly enforcing the limit, because we still have the datapath flows
that are passing traffic for expired rules.

There's a small corner case when we have to slowpath the ukey.  This
happens when:
* The learned rule has expired (or has been deleted).
* The ukey that learned the rule is still in the datapath.
* No packets hit the datapath flow recently.
In this case we cannot relearn the rule (because there are no new
packets), and the actions might depend on the learn execution, so the
only option is to slowpath the ukey.  I don't think this has big
performance implications since it's done only for ukey with no traffic.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/odp-util.h |  6 --
 ofproto/ofproto-dpif-upcall.c  | 25 ++-
 ofproto/ofproto-dpif-xlate-cache.c |  6 +-
 ofproto/ofproto-dpif-xlate-cache.h |  1 +
 ofproto/ofproto-dpif-xlate.c   | 33 +++---
 ofproto/ofproto-dpif-xlate.h   |  3 +++
 ofproto/ofproto-dpif.c |  9 +---
 ofproto/ofproto-dpif.h |  2 +-
 ofproto/ofproto-provider.h | 17 +--
 ofproto/ofproto.c  | 42 --
 10 files changed, 121 insertions(+), 23 deletions(-)

diff --git a/lib/odp-util.h b/lib/odp-util.h
index 42011bccd..1f10d981d 100644
--- a/lib/odp-util.h
+++ b/lib/odp-util.h
@@ -41,11 +41,13 @@ struct pkt_metadata;
 SPR(SLOW_BFD,"bfd","Consists of BFD packets")   \
 SPR(SLOW_LACP,   "lacp",   "Consists of LACP packets")  \
 SPR(SLOW_STP,"stp","Consists of STP packets")   \
-SPR(SLOW_LLDP,   "lldp",   "Consists of LLDP packets")\
+SPR(SLOW_LLDP,   "lldp",   "Consists of LLDP packets")  \
 SPR(SLOW_CONTROLLER, "controller",  \
 "Sends \"packet-in\" messages to the OpenFlow controller")  \
 SPR(SLOW_ACTION, "action",  \
-"Uses action(s) not supported by datapath")
+"Uses action(s) not supported by datapath") \
+SPR(SLOW_MAY_LEARN,  "learn",   \
+"New packets may or may not learn new flows")   \
 
 /* Indexes for slow-path reasons.  Client code uses "enum slow_path_reason"
  * values instead of these, these are just a way to construct those. */
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 35b5b7533..60b24e77d 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -252,9 +252,13 @@ enum ukey_state {
 /* Each udpif_key can hold reference to global objects in an ofproto.  These
  * references are stored here. */
 struct ukey_refs {
-struct recirc_refs recircs;  /* Action recirc IDs with references held. */
+/* Action recirc IDs with references held. */
+struct recirc_refs recircs;
+/* References to counters used for learn action limits. */
+struct cookie_counter_refs learn_refs;
 };
-#define UKEY_REFS_INIT {RECIRC_REFS_EMPTY_INITIALIZER}
+#define UKEY_REFS_INIT \
+{RECIRC_REFS_EMPTY_INITIALIZER, COOKIE_COUNTER_REFS_INIT}
 
 /* 'udpif_key's are responsible for tracking the little bit of state udpif
  * needs to do flow expiration which can't be pulled directly from the
@@ -1495,9 +1499,13 @@ ukey_create__(const struct nlattr *key, size_t key_len,
 
 ukey->key_recirc_id = key_recirc_id;
 recirc_refs_init(>global_refs.recircs);
+cookie_counter_refs_init(>global_refs.learn_refs);
 if (xout) {
-/* Take ownership of the action recirc id references. */
+/* Take ownership of the action recirc id and learn counters
+ * references. */
 recirc_refs_swap(>global_refs.recircs, >recircs);
+cookie_counter_refs_swap(>global_refs.learn_refs,
+ >learn_refs);
 }
 
 return ukey;
@@ -1789,6 +1797,7 @@ ukey_delete__(struct udpif_key *ukey)
 recirc_free_id(ukey->key_recirc_id);
 }
 recirc_refs_unref(>

[ovs-dev] [PATCH 1/7] ofp-actions: Factor out decode_LEARN_common().

2017-02-24 Thread Daniele Di Proietto
No functional change, it will be used by next commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ofp-actions.c | 77 +++
 1 file changed, 43 insertions(+), 34 deletions(-)

diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index ce80f57e8..78f8c4366 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -4313,43 +4313,14 @@ learn_min_len(uint16_t header)
 return min_len;
 }
 
-/* Converts 'nal' into a "struct ofpact_learn" and appends that struct to
- * 'ofpacts'.  Returns 0 if successful, otherwise an OFPERR_*. */
 static enum ofperr
-decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
-   enum ofp_version ofp_version OVS_UNUSED,
-   const struct vl_mff_map *vl_mff_map,
-   struct ofpbuf *ofpacts)
+decode_LEARN_common(const void *p, const void *end,
+const struct vl_mff_map *vl_mff_map,
+struct ofpbuf *ofpacts)
 {
-struct ofpact_learn *learn;
-const void *p, *end;
+struct ofpact_learn *learn = ofpacts->header;
 
-if (nal->pad) {
-return OFPERR_OFPBAC_BAD_ARGUMENT;
-}
-
-learn = ofpact_put_LEARN(ofpacts);
-
-learn->idle_timeout = ntohs(nal->idle_timeout);
-learn->hard_timeout = ntohs(nal->hard_timeout);
-learn->priority = ntohs(nal->priority);
-learn->cookie = nal->cookie;
-learn->table_id = nal->table_id;
-learn->fin_idle_timeout = ntohs(nal->fin_idle_timeout);
-learn->fin_hard_timeout = ntohs(nal->fin_hard_timeout);
-
-learn->flags = ntohs(nal->flags);
-if (learn->flags & ~(NX_LEARN_F_SEND_FLOW_REM |
- NX_LEARN_F_DELETE_LEARNED)) {
-return OFPERR_OFPBAC_BAD_ARGUMENT;
-}
-
-if (learn->table_id == 0xff) {
-return OFPERR_OFPBAC_BAD_ARGUMENT;
-}
-
-end = (char *) nal + ntohs(nal->len);
-for (p = nal + 1; p != end; ) {
+while (p != end) {
 struct ofpact_learn_spec *spec;
 uint16_t header = ntohs(get_be16());
 
@@ -4422,6 +4393,44 @@ decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
 return 0;
 }
 
+/* Converts 'nal' into a "struct ofpact_learn" and appends that struct to
+ * 'ofpacts'.  Returns 0 if successful, otherwise an OFPERR_*. */
+static enum ofperr
+decode_NXAST_RAW_LEARN(const struct nx_action_learn *nal,
+   enum ofp_version ofp_version OVS_UNUSED,
+   const struct vl_mff_map *vl_mff_map,
+   struct ofpbuf *ofpacts)
+{
+struct ofpact_learn *learn;
+
+if (nal->pad) {
+return OFPERR_OFPBAC_BAD_ARGUMENT;
+}
+
+learn = ofpact_put_LEARN(ofpacts);
+
+learn->idle_timeout = ntohs(nal->idle_timeout);
+learn->hard_timeout = ntohs(nal->hard_timeout);
+learn->priority = ntohs(nal->priority);
+learn->cookie = nal->cookie;
+learn->table_id = nal->table_id;
+learn->fin_idle_timeout = ntohs(nal->fin_idle_timeout);
+learn->fin_hard_timeout = ntohs(nal->fin_hard_timeout);
+
+learn->flags = ntohs(nal->flags);
+if (learn->flags & ~(NX_LEARN_F_SEND_FLOW_REM |
+ NX_LEARN_F_DELETE_LEARNED)) {
+return OFPERR_OFPBAC_BAD_ARGUMENT;
+}
+
+if (learn->table_id == 0xff) {
+return OFPERR_OFPBAC_BAD_ARGUMENT;
+}
+
+return decode_LEARN_common(nal + 1, (char *) nal + ntohs(nal->len),
+   vl_mff_map, ofpacts);
+}
+
 static void
 put_be16(struct ofpbuf *b, ovs_be16 x)
 {
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/7] ofp-actions: Add limit to learn action.

2017-02-24 Thread Daniele Di Proietto
This commit adds a new feature to the learn actions: the possibility to
limit the number of learned flows.

To be compatible with users of the old learn action, a new structure is
introduced as well as a new OpenFlow raw action number.

This commit only implements parsing of the action and documentation.
A later commit will implement the feature in ofproto-dpif.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 include/openvswitch/ofp-actions.h |  12 
 lib/learn.c   |  28 
 lib/ofp-actions.c | 135 ++
 tests/ofp-actions.at  |  14 
 utilities/ovs-ofctl.8.in  |  19 ++
 5 files changed, 197 insertions(+), 11 deletions(-)

diff --git a/include/openvswitch/ofp-actions.h 
b/include/openvswitch/ofp-actions.h
index 88f573dcd..c1199a4ec 100644
--- a/include/openvswitch/ofp-actions.h
+++ b/include/openvswitch/ofp-actions.h
@@ -652,6 +652,10 @@ struct ofpact_resubmit {
  * If NX_LEARN_F_SEND_FLOW_REM is set, then the learned flows will have their
  * OFPFF_SEND_FLOW_REM flag set.
  *
+ * If NX_LEARN_F_WRITE_RESULT is set, then the actions will write whether the
+ * learn operation succeded on a bit.  If the learn is successful the bit will
+ * be set, otherwise (e.g. if the limit is hit) the bit will be unset.
+ *
  * If NX_LEARN_F_DELETE_LEARNED is set, then removing this action will delete
  * all the flows from the learn action's 'table_id' that have the learn
  * action's 'cookie'.  Important points:
@@ -677,6 +681,7 @@ struct ofpact_resubmit {
 enum nx_learn_flags {
 NX_LEARN_F_SEND_FLOW_REM = 1 << 0,
 NX_LEARN_F_DELETE_LEARNED = 1 << 1,
+NX_LEARN_F_WRITE_RESULT = 1 << 2,
 };
 
 #define NX_LEARN_N_BITS_MASK0x3ff
@@ -740,6 +745,13 @@ struct ofpact_learn {
 ovs_be64 cookie;   /* Cookie for new flow. */
 uint16_t fin_idle_timeout; /* Idle timeout after FIN, if nonzero. */
 uint16_t fin_hard_timeout; /* Hard timeout after FIN, if nonzero. */
+/* If the number of learned flows on 'table_id' with 'cookie' exceeds
+ * this, the learn action will not add a new flow. */
+uint32_t limit;
+/* Used only if 'flags' has NX_LEARN_F_WRITE_RESULT.  If the execution
+ * failed to install a new flow because 'limit' is exceeded,
+ * result_dst will be set to 0, otherwise to 1. */
+struct mf_subfield result_dst;
 );
 
 struct ofpact_learn_spec specs[];
diff --git a/lib/learn.c b/lib/learn.c
index ce52c35f2..b1b8bc52b 100644
--- a/lib/learn.c
+++ b/lib/learn.c
@@ -406,6 +406,26 @@ learn_parse__(char *orig, char *arg, struct ofpbuf 
*ofpacts)
 learn->flags |= NX_LEARN_F_SEND_FLOW_REM;
 } else if (!strcmp(name, "delete_learned")) {
 learn->flags |= NX_LEARN_F_DELETE_LEARNED;
+} else if (!strcmp(name, "limit")) {
+learn->limit = atoi(value);
+} else if (!strcmp(name, "result_dst")) {
+char *error;
+learn->flags |= NX_LEARN_F_WRITE_RESULT;
+error = mf_parse_subfield(>result_dst, value);
+if (error) {
+return error;
+}
+if (!mf_nxm_header(learn->result_dst.field->id)) {
+return xasprintf("experimenter OXM field '%s' not supported",
+ value);
+}
+if (!learn->result_dst.field->writable) {
+return xasprintf("%s is read-only", value);
+}
+if (learn->result_dst.n_bits != 1) {
+return xasprintf("result_dst in 'learn' action must be a "
+ "single bit");
+}
 } else {
 struct ofpact_learn_spec *spec;
 char *error;
@@ -487,6 +507,14 @@ learn_format(const struct ofpact_learn *learn, struct ds 
*s)
 ds_put_format(s, ",%scookie=%s%#"PRIx64,
   colors.param, colors.end, ntohll(learn->cookie));
 }
+if (learn->limit != 0) {
+ds_put_format(s, ",%slimit=%s%"PRIu32,
+  colors.param, colors.end, learn->limit);
+}
+if (learn->flags & NX_LEARN_F_WRITE_RESULT) {
+ds_put_format(s, ",%sresult_dst=%s", colors.param, colors.end);
+mf_format_subfield(>result_dst, s);
+}
 
 OFPACT_LEARN_SPEC_FOR_EACH (spec, learn) {
 unsigned int n_bytes = DIV_ROUND_UP(spec->n_bits, 8);
diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c
index 78f8c4366..1c77e7bbd 100644
--- a/lib/ofp-actions.c
+++ b/lib/ofp-actions.c
@@ -292,6 +292,8 @@ enum ofp_raw_action_type {
 
 /* NX1.0+(16): struct nx_action_learn, ... VLMFF */
 NXAST_RAW_LEARN,
+/* NX1.0+(44): struct nx_action_learn2, ... VLMFF */
+NXAST_RAW_LEA

[ovs-dev] [PATCH 0/7] Learn action limit

2017-02-24 Thread Daniele Di Proietto
This series implements the possibility to have a limit on the number of
flows learned by a learn action.  After the learn action execution the
pipeline can read the result (to know if the limit was exceeded).

Since the datapath flows cache the result of the learn execution, we have
to postpone decreasing the counters until the ukey is deleted, otherwise
we could allow more packets than the limit prescribes.

The beginning of the series implements the new Openflow interface and
updates the documentation.

The fourth commit implements a map that keeps the counters.

The sixth commit implements the actual logic and the last commit has some
basic tests.

Daniele Di Proietto (7):
  ofp-actions: Factor out decode_LEARN_common().
  ofp-actions: Add limit to learn action.
  ofproto-dpif-xlate: Create XC_LEARN entry after learning.
  ofproto: New cookie-counters module.
  ofproto-dpif-upcall: Encapsulate 'struct recirc_refs' into struct
'ukey_refs'.
  ofproto: Implement learning limit.
  tests: Add learn action with limit tests.

 include/openvswitch/ofp-actions.h  |  12 +++
 lib/learn.c|  28 +
 lib/odp-util.h |   6 +-
 lib/ofp-actions.c  | 212 
 ofproto/automake.mk|   2 +
 ofproto/cookie-counters.c  | 216 +
 ofproto/cookie-counters.h  |  85 +++
 ofproto/ofproto-dpif-upcall.c  |  64 +++
 ofproto/ofproto-dpif-xlate-cache.c |   6 +-
 ofproto/ofproto-dpif-xlate-cache.h |   1 +
 ofproto/ofproto-dpif-xlate.c   |  53 +++--
 ofproto/ofproto-dpif-xlate.h   |   3 +
 ofproto/ofproto-dpif.c |   9 +-
 ofproto/ofproto-dpif.h |   2 +-
 ofproto/ofproto-provider.h |  17 ++-
 ofproto/ofproto.c  |  42 ++--
 tests/learn.at | 175 ++
 tests/ofp-actions.at   |  14 +++
 utilities/ovs-ofctl.8.in   |  19 
 19 files changed, 877 insertions(+), 89 deletions(-)
 create mode 100644 ofproto/cookie-counters.c
 create mode 100644 ofproto/cookie-counters.h

-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-dpdk: Fix rx_error stat for dpdk ports.

2017-02-16 Thread Daniele Di Proietto
2017-02-16 7:31 GMT-08:00 Ian Stokes :
> "rx_error" stat for a DPDK interface was calculated with the assumption that
> dropped packets due to hardware buffer overload were counted as errors
> in DPDK and the rte ierror stat included rte imissed packets i.e.
>
> rx_errors = rte_stats.ierrors - rte_stats.imissed
>
> This results in negative statistic values as imissed packets are no longer
> counted as part of ierror since DPDK v.16.04.
>
> Fix this by setting rx_errors equal to ierrors only.
>
> Fixes: 9e3ddd45 (netdev-dpdk: Add some missing statistics.)
> CC: Timo Puha )
> Reported-by: Stepan Andrushko 
> Signed-off-by: Ian Stokes 

Good catch!

I've applied this to master, branch-2.7 and branch-2.6.

> ---
>  lib/netdev-dpdk.c |3 +--
>  1 files changed, 1 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 94568a1..ee53c4c 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2067,8 +2067,7 @@ out:
>  stats->tx_packets = rte_stats.opackets;
>  stats->rx_bytes = rte_stats.ibytes;
>  stats->tx_bytes = rte_stats.obytes;
> -/* DPDK counts imissed as errors, but count them here as dropped instead 
> */
> -stats->rx_errors = rte_stats.ierrors - rte_stats.imissed;
> +stats->rx_errors = rte_stats.ierrors;
>  stats->tx_errors = rte_stats.oerrors;
>
>  rte_spinlock_lock(>stats_lock);
> --
> 1.7.0.7
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v10] dpif-netdev: Conditional EMC insert

2017-02-16 Thread Daniele Di Proietto
2017-02-16 3:01 GMT-08:00 Kevin Traynor :
> On 02/16/2017 10:22 AM, Ciara Loftus wrote:
>> Unconditional insertion of EMC entries results in EMC thrashing at high
>> numbers of parallel flows. When this occurs, the performance of the EMC
>> often falls below that of the dpcls classifier, rendering the EMC
>> practically useless.
>>
>> Instead of unconditionally inserting entries into the EMC when a miss
>> occurs, use a 1% probability of insertion. This ensures that the most
>> frequent flows have the highest chance of creating an entry in the EMC,
>> and the probability of thrashing the EMC is also greatly reduced.
>>
>> The probability of insertion is configurable, via the
>> other_config:emc-insert-inv-prob option. This value sets the average
>> probability of insertion to 1/emc-insert-inv-prob.
>>
>> For example the following command changes the insertion probability to
>> (on average) 1 in every 20 packets ie. 1/20 ie. 5%.
>>
>> ovs-vsctl set Open_vSwitch . other_config:emc-insert-inv-prob=20
>>
>> Signed-off-by: Ciara Loftus 
>> Signed-off-by: Georg Schmuecking 
>> Co-authored-by: Georg Schmuecking 
>> Acked-by: Kevin Traynor 
>> ---
>> v10:
>> - Fixed typo in commit message
>> - Only store insert_min when value has changed
>> - Add prints to reflect changes in the DB
>
> Thanks for the changes, LGTM.
> Kevin.

Thanks a lot.  I squashed the following incremental to support values
that don't fit
in a signed integer:

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 35d3eda5e..31aee51a2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2762,7 +2762,9 @@ dpif_netdev_set_config(struct dpif *dpif, const
struct smap *other_config)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 const char *cmask = smap_get(other_config, "pmd-cpu-mask");
-int insert_prob = smap_get_int(other_config, "emc-insert-inv-prob", -1);
+unsigned long long insert_prob =
+smap_get_ullong(other_config, "emc-insert-inv-prob",
+DEFAULT_EM_FLOW_INSERT_INV_PROB);
 uint32_t insert_min, cur_min;

 if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
@@ -2772,7 +2774,7 @@ dpif_netdev_set_config(struct dpif *dpif, const
struct smap *other_config)
 }

 atomic_read_relaxed(>emc_insert_min, _min);
-if (insert_prob >= 0 && insert_prob <= UINT32_MAX) {
+if (insert_prob <= UINT32_MAX) {
 insert_min = insert_prob == 0 ? 0 : UINT32_MAX / insert_prob;
 } else {
 insert_min = DEFAULT_EM_FLOW_INSERT_MIN;
@@ -2784,7 +2786,7 @@ dpif_netdev_set_config(struct dpif *dpif, const
struct smap *other_config)
 if (insert_min == 0) {
 VLOG_INFO("EMC has been disabled");
 } else {
-VLOG_INFO("EMC insertion probability changed to 1/%i (~%.2f%%)",
+VLOG_INFO("EMC insertion probability changed to 1/%llu (~%.2f%%)",
   insert_prob, (100 / (float)insert_prob));
 }
 }

and pushed this to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] vswitchd: Move config_ofproto_types call before bridge_add_port

2017-02-15 Thread Daniele Di Proietto
2017-02-15 10:02 GMT-08:00 Shashank Ram :
> Currently, the call to config_ofproto_types() happens at the end
> of bridge_reconfigure(), after missing ofprotos and ports are created.
> However, it might be usefull to make this call before adding missing
> ports through the dpif interface. With the current use case
> (dpif-netdev), this will save us a reconfiguration cycle.
>
> The call to config_ofproto_types was intorduced as a
> part of passing the Openvswitch other_config smap to dpif.
> However, if we want to do this before the ports are added,
> it needs to be done after ofproto_create() is called so that
> dpif_backer is added to all_dpif_backers list. Once the
> dpif_backer is added, the call to config_ofproto_types()
> will ensure that the set_config handler in dpif-netdev/netlink.c
> is called.
>
> Signed-off-by: Shashank Ram 

Thanks, applied to master

> ---
>  vswitchd/bridge.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 21c3c79..2e10013 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -654,6 +654,9 @@ bridge_reconfigure(const struct ovsrec_open_vswitch 
> *ovs_cfg)
>  }
>  }
>  }
> +
> +config_ofproto_types(_cfg->other_config);
> +
>  HMAP_FOR_EACH (br, node, _bridges) {
>  bridge_add_ports(br, >wanted_ports);
>  shash_destroy(>wanted_ports);
> @@ -706,8 +709,6 @@ bridge_reconfigure(const struct ovsrec_open_vswitch 
> *ovs_cfg)
>  }
>  free(managers);
>
> -config_ofproto_types(_cfg->other_config);
> -
>  /* The ofproto-dpif provider does some final reconfiguration in its
>   * ->type_run() function.  We have to call it before notifying the 
> database
>   * client that reconfiguration is complete, otherwise there is a very
> --
> 2.6.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] rhel/ifup: support vhost-user client mode

2017-02-10 Thread Daniele Di Proietto
2017-02-10 9:49 GMT-08:00 Aaron Conole :
> From: Aaron Conole 
>
> This adds support for ifup to configure client-mode sockets by exposing
> two new variables $OVS_PORT_MODE and $OVS_PORT_PATH to the ifcfg
> scripts.  When OVS_PORT_MODE is set to 'client', the OVS_PORT_PATH will
> be passed as the vhost-server-path option.
>
> No change is needed to ifdown because the OVSDPDKVhostUserPort type
> already has an appropriate entry.
>
> Signed-off-by: Aaron Conole 

Thanks, applied to master, branch-2.7

> ---
>  rhel/README.RHEL.rst|  9 +
>  rhel/etc_sysconfig_network-scripts_ifup-ovs | 10 +-
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/rhel/README.RHEL.rst b/rhel/README.RHEL.rst
> index afccf17..8d57cb2 100644
> --- a/rhel/README.RHEL.rst
> +++ b/rhel/README.RHEL.rst
> @@ -73,6 +73,15 @@ OVS_PATCH_PEER
>For "OVSPatchPort" devices, this field specifies the patch's peer on the
>other bridge.
>
> +OVS_PORT_MODE
> +  For "OVSDPDKVhostUserPort" devices, this field can be set to "client" which
> +  indicates that the port will be used in client mode.
> +
> +OVS_PORT_PATH
> +  For "OVSDPDKVhostUserPort" devices, this field specifies the path to the
> +  vhost-user server socket.  It will only be used if OVS_PORT_MODE is set to
> +  "client".
> +
>  Note
>  
>
> diff --git a/rhel/etc_sysconfig_network-scripts_ifup-ovs 
> b/rhel/etc_sysconfig_network-scripts_ifup-ovs
> index e49e6fe..b95220a 100755
> --- a/rhel/etc_sysconfig_network-scripts_ifup-ovs
> +++ b/rhel/etc_sysconfig_network-scripts_ifup-ovs
> @@ -181,10 +181,18 @@ case "$TYPE" in
> ;;
> OVSDPDKVhostUserPort)
> ifup_ovs_bridge
> +   PORT_TYPE="dpdkvhostuser"
> +   PORT_PATH=""
> +   if [ "$OVS_PORT_MODE" == "client" ]; then
> +   PORT_TYPE="dpdkvhostuserclient"
> +   PORT_PATH="options:vhost-server-path=${OVS_PORT_PATH}"
> +   fi
> ovs-vsctl -t ${TIMEOUT} \
> -- --if-exists del-port "$OVS_BRIDGE" "$DEVICE" \
> -- add-port "$OVS_BRIDGE" "$DEVICE" $OVS_OPTIONS \
> -   -- set Interface "$DEVICE" type=dpdkvhostuser 
> ${OVS_EXTRA+-- $OVS_EXTRA}
> +   -- set Interface "$DEVICE" type=$PORT_TYPE \
> +   $PORT_PATH \
> +   ${OVS_EXTRA+-- $OVS_EXTRA}
> ;;
> OVSDPDKBond)
> ifup_ovs_bridge
> --
> 2.9.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] rhel: Fix ifup and ifdown after DPDK naming change.

2017-02-10 Thread Daniele Di Proietto





On 02/02/2017 12:48, "Ben Pfaff" <b...@ovn.org> wrote:

>On Tue, Jan 24, 2017 at 06:21:53PM -0800, Daniele Di Proietto wrote:
>> Names like dpdk0 and dpdk1 are not enough to identify a DPDK interface.
>> We could update README.RHEL.rst and add
>> 
>> OVS_EXTRA='set Interface ${DEVICE} options:dpdk-devargs=:01:00.0'
>> 
>> but a better solution is to add new parameters in the configuration file
>> to explicitly specify the dpdk-devargs.
>> 
>> Fixes: 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming")
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>This seems useful.

Hi Ben,

thanks for looking at this one and sorry for the delay.


>
>I don't understand why this uses "set" then $1.  Are you concerned that
>BOND_DPDK_DEVARGS might have multiple words and you want to get just the
>first one?

Now for each interface we need to specify two parameters: the name (it is
chosen by the user and it can be arbitrary) and the devargs (most likely the
PCI address).

With this patch the user enters the names in BOND_IFACES and the devargs in
BOND_DPDK_DEVARGS.

set -- ${BOND_DPDK_DEVARGS}
for _iface in ${BOND_IFACE}; do
echo $_iface $1
shift
done

is a quick and dirty way to iterate through both lists in the same loop.

Or maybe we could change the interface to specify in the same list the
name and the devargs.

Aaron, since you were looking at this as well, do you have any preference
on the user interface?

Thanks,

Daniele

>
>>  OVSDPDKBond)
>>  ifup_ovs_bridge
>> +set -- ${BOND_DPDK_DEVARGS}
>>  for _iface in $BOND_IFACES; do
>> -IFACE_TYPES="${IFACE_TYPES} -- set interface ${_iface} 
>> type=dpdk"
>> +IFACE_TYPES="${IFACE_TYPES} -- set interface ${_iface} 
>> type=dpdk options:dpdk-devargs=$1"
>> +shift
>>  done
>>  ovs-vsctl -t ${TIMEOUT} \
>>  -- --if-exists del-port "$OVS_BRIDGE" "$DEVICE" \
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] dpif-netdev: Conditional EMC insert

2017-02-03 Thread Daniele Di Proietto
2017-01-31 9:55 GMT-08:00 Ciara Loftus :
> Unconditional insertion of EMC entries results in EMC thrashing at high
> numbers of parallel flows. When this occurs, the performance of the EMC
> often falls below that of the dpcls classifier, rendering the EMC
> practically useless.
>
> Instead of unconditionally inserting entries into the EMC when a miss
> occurs, use a 1% probability of insertion. This ensures that the most
> frequent flows have the highest chance of creating an entry in the EMC,
> and the probability of thrashing the EMC is also greatly reduced.
>
> The probability of insertion is configurable, via the
> other_config:emc-insert-prob option. For example the following command
> increases the insertion probability to 1/10 ie. 10%.
>
> ovs-vsctl set Open_vSwitch . other_config:emc-insert-prob=10
>
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Georg Schmuecking 
> Co-authored-by: Georg Schmuecking 

Thanks for v3.

We should add Georg to AUTHORS.rst

One of the unit tests ("PMD - stats") fails, because it checks the emc
hit stats.

I think we can fix it by configuring emc-insert-prob to 1 in that test
(assuming that
we accept emc-insert-prob even without DPDK).

More comments inline.

> ---
> v3:
> - Use new dpif other_config infrastructure to tidy up how the
>   emc-insert-prob value is passed to dpif-netdev.
> v2:
> - Enable probability configurability via other_config:emc-insert-prob
>   option.
>
>  Documentation/howto/dpdk.rst | 20 +
>  NEWS |  2 ++
>  lib/dpif-netdev.c| 53 
> ++--
>  vswitchd/vswitch.xml | 16 +
>  4 files changed, 89 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d1e6e89..f2e888b 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -354,6 +354,26 @@ the `DPDK documentation
>
>  Note: Not all DPDK virtual PMD drivers have been tested and verified to work.
>
> +EMC Insertion Probability
> +-
> +By default 1 in every 100 flows are inserted into the Exact Match Cache 
> (EMC).
> +It is possible to change this insertion probability by setting the
> +``emc-insert-prob`` option::
> +
> +$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-prob=N
> +
> +where:
> +
> +``N``
> +  is a positive integer between 0 and 4294967295 (maximum unsigned 32bit 
> int).
> +
> +If ``N`` is set to 1, an insertion will be performed for every flow. The 
> lower
> +the value of ``emc-insert-prob`` the higher the probability of insertion,
> +except for the value 0 which will result in no insertions being performed and
> +thus essentially disabling the EMC.
> +
> +For more information on the EMC refer to :doc:`/intro/install/dpdk` .
> +
>  .. _dpdk-ovs-in-guest:
>
>  OVS with DPDK Inside VMs
> diff --git a/NEWS b/NEWS
> index 6838649..5a21580 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -65,6 +65,8 @@ Post-v2.6.0
> device will not be available for use until a valid dpdk-devargs is
> specified.
>   * Virtual DPDK Poll Mode Driver (vdev PMD) support.
> + * EMC insertion probability is reduced to 1/100 and is configurable via
> +   the new 'other_config:emc-insert-prob' option.
> - Fedora packaging:
>   * A package upgrade does not automatically restart OVS service.
> - ovs-vswitchd/ovs-vsctl:
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 0be5db5..e514ddb 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -144,6 +144,10 @@ struct netdev_flow_key {
>  #define EM_FLOW_HASH_MASK (EM_FLOW_HASH_ENTRIES - 1)
>  #define EM_FLOW_HASH_SEGS 2
>
> +/* Default EMC insert probability is 1 / DEFAULT_EM_FLOW_INSERT_PROB */
> +#define DEFAULT_EM_FLOW_INSERT_PROB 100
> +#define DEFAULT_EM_FLOW_INSERT_MIN (UINT32_MAX / DEFAULT_EM_FLOW_INSERT_PROB)
> +
>  struct emc_entry {
>  struct dp_netdev_flow *flow;
>  struct netdev_flow_key key;   /* key.hash used for emc hash value. */
> @@ -254,6 +258,9 @@ struct dp_netdev {
>  uint64_t last_tnl_conf_seq;
>
>  struct conntrack conntrack;
> +
> +/* Probability of EMC insertions is a factor of 'emc_insert_min'.*/
> +atomic_uint32_t emc_insert_min;

I'm not sure this makes sense, but I'm worried about emc_insert_min sharing the
same cacheline with other fields (in this case inside conntrack) that
are updated more
often.  Could we perhaps solve this by prefixing it with
OVS_ALIGNED_VAR(CACHE_LINE_SIZE)?

>  };
>
>  static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev 
> *dp,
> @@ -1066,6 +1073,8 @@ create_dp_netdev(const char *name, const struct 
> dpif_class *class,
>
>  conntrack_init(>conntrack);
>
> +atomic_init(>emc_insert_min, DEFAULT_EM_FLOW_INSERT_MIN);
> +
>  cmap_init(>poll_threads);
>  

Re: [ovs-dev] [PATCH v2 1/1] doc: Remove "experimental" warning for userspace.

2017-02-03 Thread Daniele Di Proietto
2017-02-03 9:38 GMT-08:00 Kevin Traynor :
> On 02/02/2017 08:22 PM, Stokes, Ian wrote:
>>> On 02/02/2017 04:44 PM, Ian Stokes wrote:
 Remove the experimental warning tag in documentation regarding OVS
 deployed via userspace.

 Signed-off-by: Ian Stokes 
 ---
  Documentation/intro/install/dpdk.rst  |3 ---
  Documentation/intro/install/userspace.rst |4 
  NEWS  |2 ++
  README.rst|6 +++---
  4 files changed, 5 insertions(+), 10 deletions(-)

 diff --git a/Documentation/intro/install/dpdk.rst
 b/Documentation/intro/install/dpdk.rst
 index fff0a1a..3018590 100644
 --- a/Documentation/intro/install/dpdk.rst
 +++ b/Documentation/intro/install/dpdk.rst
 @@ -29,9 +29,6 @@ This document describes how to build and install
 Open vSwitch using a DPDK  datapath. Open vSwitch can use the DPDK
 library to operate entirely in  userspace.

 -.. warning::
 -  The DPDK support of Open vSwitch is considered 'experimental'.
 -
  Build requirements
  --

 diff --git a/Documentation/intro/install/userspace.rst
 b/Documentation/intro/install/userspace.rst
 index 0368527 ..ebd0c12 100644
 --- a/Documentation/intro/install/userspace.rst
 +++ b/Documentation/intro/install/userspace.rst
 @@ -34,10 +34,6 @@ This version of Open vSwitch should be built
 manually with ``configure`` and  been recently tested, and so Debian
 packages are not a recommended way to use  this version of Open vSwitch.

 -.. warning::
 -  The userspace-only mode of Open vSwitch is considered experimental.
 It has
 -  not been thoroughly tested.
 -
  Building and Installing
  ---

 diff --git a/NEWS b/NEWS
 index 5efcce2..8600f0e 100644
 --- a/NEWS
 +++ b/NEWS
 @@ -3,6 +3,8 @@ Post-v2.7.0
 - Tunnels:
   * Added support to set packet mark for tunnel endpoint using
 `egress_pkt_mark` OVSDB option.
 +   - Documentation:
 + * OVS deployed in userspace mode no longer tagged as experimental.
>>>
>>> I think this would be a bit clearer. What do you think?
>>>
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -59,4 +59,5 @@ v2.7.0 - xx xxx 
>>>   * Removed support for IPsec tunnels.
>>> - DPDK:
>>> + * Removal of experimental tag.
>>
>> We can put it under a DPDK header like you have here, I wonder though is the 
>> removal of experimental only relevant to DPDK or do we need to consider OVS 
>> userspace without DPDK also? I would think both are well tested at this 
>> stage so experimental could be removed for both. Thoughts?
>>
>
> Fine by me. I just want it to be clear that it's removal is for/covers
> the DPDK datapath.

Thanks for the patch, I'm in favor of the change for DPDK.

Unless you have a strong reason do to so, I'd prefer to keep the
experimental tag for the userspace datapath without DPDK for the
following reasons:

* I don't see a lot of valid use cases for it.  I think it is
important for testing.
* There's at least a known problem with accessing linux device in
userspace without DPDK for containers with offloads. See
ddcf96d2dcc1("system-tests: Disable offloads in userspace tests.")

What do you think?

Thanks,

Daniele

>
>>>
>>>

  v2.7.0 - xx xxx 
  -
 diff --git a/README.rst b/README.rst
 index f5cdaa5..90050e3 100644
 --- a/README.rst
 +++ b/README.rst
 @@ -38,9 +38,9 @@ following features:

  The included Linux kernel module supports Linux 3.10 and up.

 -Open vSwitch can also operate, at a cost in performance, entirely in
 userspace, -without assistance from a kernel module.  This userspace
 implementation should -be easier to port than the kernel-based switch.
>>> It is considered experimental.
 +Open vSwitch can also operate entirely in userspace without
 +assistance from a kernel module.  This userspace implementation
 +should be easier to port than the kernel-based switch.

  What's here?
  

>>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] doc: Update DPDK version for 2.7 release.

2017-02-03 Thread Daniele Di Proietto
2017-02-02 8:59 GMT-08:00 Kevin Traynor :
> On 02/02/2017 04:30 PM, Ian Stokes wrote:
>> Add DPDK version required for the OVS 2.7 release in documentation.
>>
>> Signed-off-by: Ian Stokes 
>
> Acked-by: Kevin Traynor 

Thanks, pushed to master and branch-2.7

>
>> ---
>>  Documentation/faq/releases.rst |1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
>> index fcff5c3..319c2d7 100644
>> --- a/Documentation/faq/releases.rst
>> +++ b/Documentation/faq/releases.rst
>> @@ -159,6 +159,7 @@ Q: What DPDK version does each Open vSwitch release work 
>> with?
>>  2.4.x2.0
>>  2.5.x2.2
>>  2.6.x16.07
>> +2.7.x16.11
>>   =
>>
>>  Q: I get an error like this when I configure Open vSwitch::
>>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/3] rhel: Remove obsolete OVSDPDKVhostPort from ifdown script.

2017-02-03 Thread Daniele Di Proietto





On 02/02/2017 11:49, "Ben Pfaff" <b...@ovn.org> wrote:

>On Tue, Jan 24, 2017 at 06:21:52PM -0800, Daniele Di Proietto wrote:
>> The support for vhost cuse port has been removed long ago.
>> 
>> Fixes:419876444357("netdev-dpdk: Remove dpdkvhostcuse ports")
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Acked-by: Ben Pfaff <b...@ovn.org>

Thanks, pushed to master, branch-2.7 and branch-2.6
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH V2] netdev-dpdk: fix ifindex assignment for DPDK ports

2017-01-31 Thread Daniele Di Proietto





On 31/01/2017 13:52, "Ben Pfaff"  wrote:

>On Thu, Dec 08, 2016 at 01:16:22PM +0100, Przemyslaw Lal wrote:
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index de78ddd..ef99eb3 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -2075,7 +2075,13 @@ netdev_dpdk_get_ifindex(const struct netdev *netdev)
>>  int ifindex;
>>  
>>  ovs_mutex_lock(>mutex);
>> -ifindex = dev->port_id;
>> +/* Calculate hash from the netdev name using hash_bytes() function.
>> + * Because ifindex is declared as signed int in the kernel sources and
>> + * OVS follows this implementation right shift is needed to set sign bit
>> + * to 0 and then XOR to slightly improve collision rate.
>> + */
>> +uint32_t h = hash_bytes(netdev->name, strlen(netdev->name), 0);
>> +ifindex = (int)((h >> 1) ^ (h & 0x0FFF));
>
>To hash a string, please use hash_string().
>
>Daniele, are you planning to review this?

Sorry for the delay.

At some point, with vhost-pmd we will have port_ids also for vhost interfaces.  
Maybe we can revisit this approach when that becomes available.

I wish there was a better way to avoid collisions with the linux kernel, but I 
can't think of anything generic.

Unless someone else has an objection I'm fine with the approach.  Two minor 
comments:

Could you please use hash_string(), as suggested by Ben?

I guess the result after hashing and XOR could still be zero.  Could you maybe 
add a check for that case and set it to something else?

Thanks,

Daniele

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] dpif-netdev: Conditional EMC insert

2017-01-30 Thread Daniele Di Proietto
2017-01-26 9:55 GMT-08:00 Loftus, Ciara :
>>
>> 2017-01-25 7:52 GMT-08:00 Loftus, Ciara :
>> >> 2017-01-22 11:45 GMT-08:00 Jan Scheurich :
>> >> >
>> >> >> It's not a big deal, since the most important use case we have for
>> >> >> dpif-netdev is with dpdk, but I'd still like the code to behave
>> >> >> similarly on different platforms.  How about defining a function that
>> >> >> uses random_uint32 when compiling without DPDK?
>> >> >>
>> >> >> For testing it's not that simple, because unit tests can be run with
>> >> >> or without DPDK.  It would need to be configurable at runtime.
>> >> >> Perhaps making EM_FLOW_INSERT_PROB configurable at runtime
>> would
>> >> also
>> >> >> help people that want to experiment with different values, even
>> >> >> though, based on the comments, I guess they wouldn't really see
>> much
>> >> >> difference.
>> >> >>
>> >> >> Again, what do you think about simply using counting the packets and
>> >> >> inserting only 1 every EM_FLOW_INSERT_PROB?
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Daniele
>> >> >
>> >> >
>> >> > As far as I know Ciara did some quick tests with a counter-based
>> >> > implementation and it performed 5% worse for 1K and 4K flows than
>> then
>> >> > current patch. Perhaps we could find the reason for that and fix it, 
>> >> > but I
>> >> > also feel uncomfortable with deterministic insertion of every Nth flow.
>> This
>> >> > could lead to very strange lock-step phenomena with typical artificial
>> test
>> >> > work loads, which often generate flows round-robin. I would rather use
>> a
>> >> > random function, as you suggest, or count "cycles" differently when
>> >> > compiling without DPDK.
>> >>
>> >> Ok, using another pseudo random function when compiling without DPDK
>> >> sounds
>> >> good to me.
>> >>
>> >
>> > Any suggestions for the random function?
>>
>> I think we can use random_uint32() from lib/random.h
>>
>> >
>> >> >
>> >> > I agree to making the parameter EM_FLOW_INSERT_PROB configurable
>> for
>> >> unit
>> >> > test or other purposes. Should it be a new option in the OpenvSwitch
>> table
>> >> > in OVSDB or rather a run-time parameter to be changed with ovs-
>> appctl?
>> >>
>> >> I think a new option in Openvswitch other_config would be appropriate.
>> >
>> > I like this idea. I've started making these changes. How about something
>> like the following?..
>> >
>> > +  > > +  type='{"type": "integer", "minInteger": 0, "maxInteger":
>> 4294967295}'>
>> > +
>> > +  Specifies the probability (1/emc-insert-prob) of a flow being
>> > +  inserted into the Exact Match Cache (EMC). Higher values of
>> > +  emc-insert-prob will result in less insertions, and lower
>> > +  values will result in more insertions. A value of zero will
>> > +  result in no insertions and essentially disable the EMC.
>> > +
>> > +
>> > +  Defaults to 100 ie. there is 1/100 chance of EMC insertion.
>>
>> Looks good to me, thanks.
>>
>> I would also add that this only applies to 'netdev' bridges (userspace) and
>> that
>> a value of 1 means that every flow is going to be sent to the EMC.
>
> Thanks Daniele. I've posted a v2. Not sure it's the ideal approach so would 
> appreciate your feedback if you get a chance.
>
> On a separate note, I'm wondering would now be a good time to consider 
> allowing the size of the EMC to be configurable ie. allow EM_FLOW_HASH_SHIFT 
> or a similar value to be modifiable. Whatever scheme is followed for 
> modifying insert probability could probably be easily be re-used for EMC 
> sizing too. Just an idea!

By the way, I forgot to mention that I like this idea.  Hopefully it
shouldn't introduce any overhead.

Thanks,

Daniele

>
> Thanks,
> Ciara
>
>>
>> >
>> > Thanks,
>> > Ciara
>> >
>> >>
>> >> Thanks,
>> >>
>> >> Daniele
>> >>
>> >> >
>> >> > Jan
>> >> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/3] rhel: Fix ifdown for OVSDPDKBond.

2017-01-30 Thread Daniele Di Proietto





On 26/01/2017 11:11, "Aaron Conole" <acon...@redhat.com> wrote:

>Daniele Di Proietto <diproiet...@vmware.com> writes:
>
>> The OVSDPDKBond case wasn't handled in the rhel ifdown script.
>>
>> Fixes: f6bf8880613a ("rhel: Add support DPDK port creation via network 
>> scripts")
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> ---
>
>D'oh!
>
>Acked-by: Aaron Conole <acon...@redhat.com>

Thanks!  Pushed to master, branch-2.{7,6,5}
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v4 4/6] Unset CS_NEW for established connections.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 34728a6..aaecb00 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -443,6 +443,7 @@ conn_update_state(struct conntrack *ct, struct dp_packet 
> *pkt,
>  switch (res) {
>  case CT_UPDATE_VALID:
>  *state |= CS_ESTABLISHED;
> +*state &= ~CS_NEW;

Maybe I'm missing something, but can *state be !=0 at this point?

>  if (ctx->reply) {
>  *state |= CS_REPLY_DIR;
>  }
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [patch_v4 3/6] Userspace Datapath: Introduce NAT support.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> This patch introduces NAT support for the userspace datapath.
> The conntrack module changes are in this patch.
>
> The per packet scope of lookups for NAT and un_NAT is at
> the bucket level rather than global. One hash table is
> introduced to support create/delete handling. The create/delete
> events may be further optimized, if the need becomes clear.
>
> Some NAT options with limited utility (persistent, random) are
> not supported yet, but will be supported in a later patch.
>
> Signed-off-by: Darrell Ball 

Sparse reports some problems:

../lib/conntrack.c:1375:16: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:1398:9: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:1400:36: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:1403:33: warning: constant 0x is so
big it is unsigned long
../lib/conntrack.c:214:30: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:240:30: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1360:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1362:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1365:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1367:52: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1395:44: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1396:44: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1409:25: error: no member 's6_addr32' in struct in6_addr
../lib/conntrack.c:1410:25: error: no member 's6_addr32' in struct in6_addr

There are some minor coding style problems, you can find them with
utilities/checkpatch.py

> ---
>  lib/conntrack-private.h |  16 +-
>  lib/conntrack.c | 740 
> ++--
>  lib/conntrack.h |  44 +++
>  3 files changed, 717 insertions(+), 83 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 493865f..b71af37 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -51,14 +51,23 @@ struct conn_key {
>  uint16_t zone;
>  };
>
> +struct nat_conn_key_node {
> +struct hmap_node node;
> +struct conn_key key;
> +struct conn_key value;
> +};
> +
>  struct conn {
>  struct conn_key key;
>  struct conn_key rev_key;
>  long long expiration;
>  struct ovs_list exp_node;
>  struct hmap_node node;
> -uint32_t mark;
>  ovs_u128 label;
> +/* XXX: consider flattening. */
> +struct nat_action_info_t *nat_info;
> +uint32_t mark;
> +uint8_t conn_type;
>  };
>
>  enum ct_update_res {
> @@ -67,6 +76,11 @@ enum ct_update_res {
>  CT_UPDATE_NEW,
>  };
>
> +enum ct_conn_type {
> +CT_CONN_TYPE_DEFAULT,
> +   CT_CONN_TYPE_UN_NAT,
> +};
> +
>  struct ct_l4_proto {
>  struct conn *(*new_conn)(struct conntrack_bucket *, struct dp_packet 
> *pkt,
>   long long now);
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 0a611a2..34728a6 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -76,6 +76,20 @@ static void set_label(struct dp_packet *, struct conn *,
>const struct ovs_key_ct_labels *mask);
>  static void *clean_thread_main(void *f_);
>
> +static struct nat_conn_key_node *
> +nat_conn_keys_lookup(struct hmap *nat_conn_keys,
> + const struct conn_key *key,
> + uint32_t basis);
> +
> +static void
> +nat_conn_keys_remove(struct hmap *nat_conn_keys,
> +const struct conn_key *key,
> +uint32_t basis);
> +
> +static bool
> +nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
> +  struct conn *nat_conn);
> +
>  static struct ct_l4_proto *l4_protos[] = {
>  [IPPROTO_TCP] = _proto_tcp,
>  [IPPROTO_UDP] = _proto_other,
> @@ -90,9 +104,11 @@ long long ct_timeout_val[] = {
>  };
>
>  /* If the total number of connections goes above this value, no new 
> connections
> - * are accepted */
> + * are accepted; this is for CT_CONN_TYPE_DEFAULT connections. */
>  #define DEFAULT_N_CONN_LIMIT 300
>
> +#define DT
> +

I guess this is left here from debugging

>  /* Initializes the connection tracker 'ct'.  The caller is responsible for
>   * calling 'conntrack_destroy()', when the instance is not needed anymore */
>  void
> @@ -101,6 +117,11 @@ conntrack_init(struct conntrack *ct)
>  unsigned i, j;
>  long long now = time_msec();
>
> +ct_rwlock_init(>nat_resources_lock);
> +ct_rwlock_wrlock(>nat_resources_lock);
> +hmap_init(>nat_conn_keys);
> +ct_rwlock_unlock(>nat_resources_lock);
> +
>  for (i = 0; i < CONNTRACK_BUCKETS; i++) {
>  struct conntrack_bucket *ctb = >buckets[i];
>
> @@ -139,13 

Re: [ovs-dev] [patch_v4 2/6] Parse NAT netlink for userspace datapath.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 10:50 GMT-08:00 Darrell Ball :
> Signed-off-by: Darrell Ball 
> ---
>  lib/conntrack-private.h |  9 --
>  lib/conntrack.c |  3 +-
>  lib/conntrack.h | 31 +-
>  lib/dpif-netdev.c   | 85 
> ++---
>  tests/test-conntrack.c  |  8 +++--
>  5 files changed, 118 insertions(+), 18 deletions(-)
>
> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
> index 013f19f..493865f 100644
> --- a/lib/conntrack-private.h
> +++ b/lib/conntrack-private.h
> @@ -29,15 +29,6 @@
>  #include "packets.h"
>  #include "unaligned.h"
>
> -struct ct_addr {
> -union {
> -ovs_16aligned_be32 ipv4;
> -union ovs_16aligned_in6_addr ipv6;
> -ovs_be32 ipv4_aligned;
> -struct in6_addr ipv6_aligned;
> -};
> -};
> -
>  struct ct_endpoint {
>  struct ct_addr addr;
>  union {
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 9bea3d9..bae42a3 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -273,7 +273,8 @@ conntrack_execute(struct conntrack *ct, struct 
> dp_packet_batch *pkt_batch,
>ovs_be16 dl_type, bool commit, uint16_t zone,
>const uint32_t *setmark,
>const struct ovs_key_ct_labels *setlabel,
> -  const char *helper)
> +  const char *helper,
> + const struct nat_action_info_t 
> *nat_action_info OVS_UNUSED)
>  {
>  struct dp_packet **pkts = pkt_batch->packets;
>  size_t cnt = pkt_batch->count;
> diff --git a/lib/conntrack.h b/lib/conntrack.h
> index 254f61c..cbdfb91 100644
> --- a/lib/conntrack.h
> +++ b/lib/conntrack.h
> @@ -26,6 +26,8 @@
>  #include "openvswitch/thread.h"
>  #include "openvswitch/types.h"
>  #include "ovs-atomic.h"
> +#include "ovs-thread.h"
> +#include "packets.h"
>
>  /* Userspace connection tracker
>   * 
> @@ -61,6 +63,32 @@ struct dp_packet_batch;
>
>  struct conntrack;
>
> +struct ct_addr {
> +union {
> +ovs_16aligned_be32 ipv4;
> +union ovs_16aligned_in6_addr ipv6;
> +ovs_be32 ipv4_aligned;
> +struct in6_addr ipv6_aligned;
> +};
> +};
> +
> +// Both NAT_ACTION_* and NAT_ACTION_*_PORT can be set

We normally don't use // comments

> +enum nat_action_e {
> +   NAT_ACTION = 1 << 0,
> +   NAT_ACTION_SRC = 1 << 1,
> +   NAT_ACTION_SRC_PORT = 1 << 2,
> +   NAT_ACTION_DST = 1 << 3,
> +   NAT_ACTION_DST_PORT = 1 << 4,
> +};

This is indented by tabs, instead of 4 whitespaces.

Is NAT_ACTION really necessary?  I think it should always be set when
nat_action_info is != NULL, so we can probably remove it.

> +
> +struct nat_action_info_t {
> +   struct ct_addr min_addr;
> +   struct ct_addr max_addr;
> +   uint16_t min_port;
> +   uint16_t max_port;

Tabs

> +uint16_t nat_action;
> +};
> +
>  void conntrack_init(struct conntrack *);
>  void conntrack_destroy(struct conntrack *);
>
> @@ -68,7 +96,8 @@ int conntrack_execute(struct conntrack *, struct 
> dp_packet_batch *,
>ovs_be16 dl_type, bool commit,
>uint16_t zone, const uint32_t *setmark,
>const struct ovs_key_ct_labels *setlabel,
> -  const char *helper);
> +  const char *helper,
> +  const struct nat_action_info_t *nat_action_info);
>
>  struct conntrack_dump {
>  struct conntrack *ct;
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 3901129..a71c766 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -97,7 +97,8 @@ static struct shash dp_netdevs 
> OVS_GUARDED_BY(dp_netdev_mutex)
>  static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
>
>  #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
> - | CS_INVALID | CS_REPLY_DIR | 
> CS_TRACKED)
> + | CS_INVALID | CS_REPLY_DIR | 
> CS_TRACKED \
> + | CS_SRC_NAT | CS_DST_NAT)
>  #define DP_NETDEV_CS_UNSUPPORTED_MASK 
> (~(uint32_t)DP_NETDEV_CS_SUPPORTED_MASK)
>
>  static struct odp_support dp_netdev_support = {
> @@ -4681,7 +4682,9 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
> *packets_,
>  const char *helper = NULL;
>  const uint32_t *setmark = NULL;
>  const struct ovs_key_ct_labels *setlabel = NULL;
> -
> +struct nat_action_info_t nat_action_info;
> +bool nat = false;
> +memset(_action_info, 0, sizeof nat_action_info);

As discussed offline, can this memset  be moved inside the OVS_CT_ATTR_NAT case?

>  NL_ATTR_FOR_EACH_UNSAFE (b, left, nl_attr_get(a),
>   nl_attr_get_size(a)) {
>  enum ovs_ct_attr sub_type = nl_attr_type(b);
> @@ -4702,15 +4705,89 @@ dp_execute_cb(void 

Re: [ovs-dev] [patch_v4 0/6] Userspace Datapath: Introduce NAT support.

2017-01-27 Thread Daniele Di Proietto
2017-01-24 20:40 GMT-08:00 Darrell Ball :
> This patch series introduces NAT support for the userspace datapath.
>
> The per packet scope of lookups for NAT and un_NAT is at
> the bucket level rather than global. One hash table is
> introduced to support create/delete handling. The create/delete
> events may be further optimized, if the need becomes clear.
>
> The existing NAT tests are enabled for the dpdk datapath,
> with an added enhancement to the V6 NAT test.
>
> Some NAT options with limited utility (persistent, random) are
> not supported yet, but will be supported in a later patch.
>
> One V6 api is exported to facilitate selective editing the V6
> header - packet_set_ipv6_addr().
>
> alg and fragmentation support are not included here but are
> being worked on.
>
> NEWS is not updated in this series yet, until confirmation of
> release.
>
> I realize patch 3 is big. It may be clearer and easier to keep
> as a single patch, so I have done that after some discussion.

Thanks a lot for the series.  All the NAT and OVN system tests
are passing, which is great!

You can include an update to NEWS, it won't be pushed before the
rest of the series :-)

Usually we prefix the commit message with the name of the module
that the commit touches.

More comments in the various commits

I'm sorry I don't have more meaningful comments, yet.  I'll keep looking
at the series

Thanks,

Daniele


>
> v3->v4: Fix rev_key vs key for nat_conn_keys access in a couple
> places; this would have affected cleanup; at same time
> rename some variables and change nat_conn_keys APIs to
> use conn key, rather than conn.
>
> Fix conntrack_flush() CT_CONN_TYPE_DEFAULT flag placement;
> the intention was that it be the same as in sweep_bucket().
>
> Fix nat_ipv6_addrs_delta() max boundary checking logic. I
> also enhanced the conntrack - IPv6 HTTP with NAT test to
> give it more coverage as partial penance.
>
> Rebase
>
> v2->v3: Fix a theoretical resend for closed connection restart.
> Parse out a function to help and also limit
> conn_state_update() to one.
>
> I decided to cap V6 address range delta at 4 billion using
> internal adjustment (user visibility not required).
>
> Some cleanup of deprecated code path.
>
> Parse out some more changes as separate patches.
>
> v1->v2: Updates/fixes that were missed in v1 patches.
>
> Darrell Ball (6):
>   Export packet_set_ipv6_addr()fordpdkdatapath.
>   Parse NAT netlink for userspace datapath.
>   Userspace Datapath: Introduce NAT support.
>   Unset CS_NEW for established connections.
>   Enable NAT tests for userspace datapath.
>   Enhance V6 NAT test.
>
>  lib/conntrack-private.h  |  25 +-
>  lib/conntrack.c  | 742 
> ++-
>  lib/conntrack.h  |  73 +++-
>  lib/dpif-netdev.c|  85 -
>  lib/packets.c|   2 +-
>  lib/packets.h|   4 +
>  tests/system-traffic.at  |   4 +-
>  tests/system-userspace-macros.at |   7 +-
>  tests/test-conntrack.c   |   8 +-
>  9 files changed, 843 insertions(+), 107 deletions(-)
>
> --
> 1.9.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Conditional EMC insert

2017-01-27 Thread Daniele Di Proietto
2017-01-26 9:51 GMT-08:00 Ciara Loftus :
> Unconditional insertion of EMC entries results in EMC thrashing at high
> numbers of parallel flows. When this occurs, the performance of the EMC
> often falls below that of the dpcls classifier, rendering the EMC
> practically useless.
>
> Instead of unconditionally inserting entries into the EMC when a miss
> occurs, use a 1% probability of insertion. This ensures that the most
> frequent flows have the highest chance of creating an entry in the EMC,
> and the probability of thrashing the EMC is also greatly reduced.
>
> The probability of insertion is configurable, via the
> other_config:emc-insert-prob option. For example the following command
> increases the insertion probability to 1/10 ie. 10%.
>
> ovs-vsctl set Open_vSwitch . other_config:emc-insert-prob=10
>
> Signed-off-by: Ciara Loftus 
> Signed-off-by: Georg Schmuecking 
> Co-authored-by: Georg Schmuecking 

Thanks for v2

I think the patch doesn't compile without DPDK.  Also there's no way to control
the value without DPDK.

I think we could pass down the value like we do for pmd-cpu-mask, this would
make it work even without DPDK.  I sent a patch that extends what we do for
pmd-cpu-mask:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/328161.html

Can we avoid having to restart the daemon when we want to change this?

I think we should store the probability in 'struct dp_netdev' using an atomic
uint32.  We can read and write to it using atomic relaxed operation which
have no additional cost, like we do for 'enable_megaflows' in
ofproto-dpif-upcall.c

If you want to store it in pmd without atomics, like Jan suggested, I
think we can
use reconfiguration to change it at runtime.

Thanks,

Daniele

> ---
> v2:
> - Enable probability configurability via other_config:emc-insert-prob
>   option.
>
>  Documentation/howto/dpdk.rst | 23 +++
>  NEWS |  2 ++
>  lib/dpdk.c   | 15 +++
>  lib/dpdk.h   |  1 +
>  lib/dpif-netdev.c| 28 ++--
>  vswitchd/vswitch.xml | 17 +
>  6 files changed, 84 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d1e6e89..a37b9d5 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -354,6 +354,29 @@ the `DPDK documentation
>
>  Note: Not all DPDK virtual PMD drivers have been tested and verified to work.
>
> +EMC Insertion Probability
> +-
> +By default 1 in every 100 flows are inserted into the Exact Match Cache 
> (EMC).
> +It is possible to change this insertion probability by setting the
> +``emc-insert-prob`` option::
> +
> +$ ovs-vsctl --no-wait set Open_vSwitch . other_config:emc-insert-prob=N
> +
> +where:
> +
> +``N``
> +  is a positive integer between 0 and 4294967295.
> +
> +If ``N`` is set to 1, an insertion will be performed for every flow. The 
> lower
> +the value of ``emc-insert-prob`` the higher the probability of insertion,
> +except for the value 0 which will result in no insertions being performed and
> +thus essentially disabling the EMC.
> +
> +If ``emc-insert-prob`` is modified, the daemon needs to be restarted in order
> +for the changes to take effect.
> +
> +For more information on the EMC refer to :doc:`/intro/install/dpdk` .
> +
>  .. _dpdk-ovs-in-guest:
>
>  OVS with DPDK Inside VMs
> diff --git a/NEWS b/NEWS
> index 0a9551c..8fb1f53 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -63,6 +63,8 @@ Post-v2.6.0
> device will not be available for use until a valid dpdk-devargs is
> specified.
>   * Virtual DPDK Poll Mode Driver (vdev PMD) support.
> + * New 'other_config:emc-insert-prob' field for userspace netdevs that
> +   allows definition of the EMC insertion probability.
> - Fedora packaging:
>   * A package upgrade does not automatically restart OVS service.
> - ovs-vswitchd/ovs-vsctl:
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 9ae2491..bb9e758 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -38,6 +38,8 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
>
>  static char *vhost_sock_dir = NULL;   /* Location of vhost-user sockets */
>
> +static uint32_t emc_insert_min = UINT32_MAX / 100;
> +
>  static int
>  process_vhost_flags(char *flag, char *default_val, int size,
>  const struct smap *ovs_other_config,
> @@ -272,6 +274,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>  int err = 0;
>  cpu_set_t cpuset;
>  char *sock_dir_subcomponent;
> +int insert_prob;
>
>  if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
>  NAME_MAX, ovs_other_config,
> @@ -297,6 +300,12 @@ dpdk_init__(const struct smap *ovs_other_config)
>  vhost_sock_dir = 

[ovs-dev] [PATCH] dpif-netdev: Pass Openvswitch other_config smap to dpif.

2017-01-27 Thread Daniele Di Proietto
Currently we parse the 'other_config' column in Openvswitch table in
bridge.c.  We extract the values (just 'pmd-cpu-mask' for now) and we
pass them down to the datapath, via different layers.

If we want to pass other values to dpif-netdev.c (like we recently
discussed) we would have to touch ofproto.c, ofproto-dpif.c and dpif.c.

This patch sends the entire other_config column to dpif-netdev, so that
dpif-netdev can extract the values it's interested in.

No functional change.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
I don't like that dpif-netdev receives the whole other_config column,
because it contains other values which are completely unrelated, but
unfortunately there's no better place in the database for datapath
specific configuration.
---
 lib/dpif-netdev.c  |  9 +
 lib/dpif-netlink.c |  2 +-
 lib/dpif-provider.h|  8 +++-
 lib/dpif.c | 12 ++--
 lib/dpif.h |  2 +-
 ofproto/ofproto-dpif.c | 19 ---
 ofproto/ofproto-provider.h | 11 ---
 ofproto/ofproto.c  | 13 +
 ofproto/ofproto.h  |  3 ++-
 vswitchd/bridge.c  | 18 +-
 10 files changed, 68 insertions(+), 29 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 719a51823..0be5db514 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2724,12 +2724,13 @@ dpif_netdev_operate(struct dpif *dpif, struct dpif_op 
**ops, size_t n_ops)
 }
 }
 
-/* Changes the number or the affinity of pmd threads.  The changes are actually
- * applied in dpif_netdev_run(). */
+/* Applies datapath configuration from the database. Some of the changes are
+ * actually applied in dpif_netdev_run(). */
 static int
-dpif_netdev_pmd_set(struct dpif *dpif, const char *cmask)
+dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
+const char *cmask = smap_get(other_config, "pmd-cpu-mask");
 
 if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
 free(dp->pmd_cmask);
@@ -4844,7 +4845,7 @@ const struct dpif_class dpif_netdev_class = {
 dpif_netdev_operate,
 NULL,   /* recv_set */
 NULL,   /* handlers_set */
-dpif_netdev_pmd_set,
+dpif_netdev_set_config,
 dpif_netdev_queue_to_priority,
 NULL,   /* recv */
 NULL,   /* recv_wait */
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index c8b0e37f9..9762a87be 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -2387,7 +2387,7 @@ const struct dpif_class dpif_netlink_class = {
 dpif_netlink_operate,
 dpif_netlink_recv_set,
 dpif_netlink_handlers_set,
-NULL,   /* poll_thread_set */
+NULL,   /* set_config */
 dpif_netlink_queue_to_priority,
 dpif_netlink_recv,
 dpif_netlink_recv_wait,
diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h
index d3b2bb91d..a0dc1ef35 100644
--- a/lib/dpif-provider.h
+++ b/lib/dpif-provider.h
@@ -326,11 +326,9 @@ struct dpif_class {
  * */
 int (*handlers_set)(struct dpif *dpif, uint32_t n_handlers);
 
-/* If 'dpif' creates its own I/O polling threads, refreshes poll threads
- * configuration.  'cmask' configures the cpu mask for setting the polling
- * threads' cpu affinity.  The implementation might postpone applying the
- * changes until run() is called. */
-int (*poll_threads_set)(struct dpif *dpif, const char *cmask);
+/* Pass custom configuration options to the datapath.  The implementation
+ * might postpone applying the changes until run() is called. */
+int (*set_config)(struct dpif *dpif, const struct smap *other_config);
 
 /* Translates OpenFlow queue ID 'queue_id' (in host byte order) into a
  * priority value used for setting packet priority. */
diff --git a/lib/dpif.c b/lib/dpif.c
index 374f013ab..57aa3c6c4 100644
--- a/lib/dpif.c
+++ b/lib/dpif.c
@@ -1440,17 +1440,17 @@ dpif_print_packet(struct dpif *dpif, struct dpif_upcall 
*upcall)
 }
 }
 
-/* If 'dpif' creates its own I/O polling threads, refreshes poll threads
- * configuration. */
+/* Pass custom configuration to the datapath implementation.  Some of the
+ * changes can be postponed until dpif_run() is called. */
 int
-dpif_poll_threads_set(struct dpif *dpif, const char *cmask)
+dpif_set_config(struct dpif *dpif, const struct smap *cfg)
 {
 int error = 0;
 
-if (dpif->dpif_class->poll_threads_set) {
-error = dpif->dpif_class->poll_threads_set(dpif, cmask);
+if (dpif->dpif_class->set_config) {
+error = dpif->dpif_class->set_config(dpif, cfg);
 if (error) {
-log_operation(dpif, "poll_threads_set", error);
+log_operation(dpif, "set_config", error);
 }
 }
 
diff --git a/l

Re: [ovs-dev] [PATCH] extract-ofp-fields: Define .TQ directive in nroff output.

2017-01-25 Thread Daniele Di Proietto
2017-01-25 20:31 GMT-08:00 Ben Pfaff <b...@ovn.org>:
> This missing directive caused groff warnings and probably some erroneous
> output too.
>
> Fixes: 96fee5e0a2a0 ("ovs-fields: New manpage to document Open vSwitch and 
> OpenFlow fields.")
> Reported-by: Daniele Di Proietto <diproiet...@ovn.org>
> Signed-off-by: Ben Pfaff <b...@ovn.org>

Acked-by: Daniele Di Proietto <diproiet...@ovn.org>

> ---
>  build-aux/extract-ofp-fields | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/build-aux/extract-ofp-fields b/build-aux/extract-ofp-fields
> index 4c92246..333d90e 100755
> --- a/build-aux/extract-ofp-fields
> +++ b/build-aux/extract-ofp-fields
> @@ -714,6 +714,12 @@ def make_ovs_fields(meta_flow_h, meta_flow_xml):
>  .  ns
>  .  IP "$1"
>  ..
> +
> +.de TQ
> +.  br
> +.  ns
> +.  TP "$1"
> +..
>  .de URL
>  $2 \\(laURL: $1 \\(ra$3
>  ..
> --
> 2.10.2
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] selinux: Allow creating tap devices.

2017-01-25 Thread Daniele Di Proietto





On 25/01/2017 00:01, "Ansis Atteka" <ansisatt...@gmail.com> wrote:

>
>
>On Jan 25, 2017 4:22 AM, "Daniele Di Proietto" <diproiet...@vmware.com> wrote:
>
>Current SELinux policy in RHEL and Fedora doesn't allow the creation of
>TAP devices.
>
>A tap device is used by dpif-netdev to create internal devices.
>
>Without this patch, adding any bridge backed by the userspace datapath
>would fail.
>
>This doesn't mean that we can run Open vSwitch with DPDK under SELinux
>yet, but at least we can use the userspace datapath.
>
>Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>
>
>
>Acked-by: Ansis Atteka <aatt...@ovn.org>
>
>
>I saw that other open source projects like OpenVPN use rw_file_perms shortcut 
>macro. Not sure how relevant that is for OVS but that macro expands to a 
>little more function calls than what you have below. Maybe we don't need it, 
>if what you have
> just worked.

Thanks a lot for the review.

I cooked this up using audit2allow and I tested it on fedora 25.  I'm now able 
to create and delete userspace bridges, without any further complaints from 
selinux

I'm definitely not an expert in SELinux, so I'm not sure if it's better to use 
the macro and ask for extra permission, or to hardcode the list.

What do you think?

>
>---
> selinux/openvswitch-custom.te | 5 +
> 1 file changed, 5 insertions(+)
>
>diff --git a/selinux/openvswitch-custom.te b/selinux/openvswitch-custom.te
>index 47ddb562c..98de89c98 100644
>--- a/selinux/openvswitch-custom.te
>+++ b/selinux/openvswitch-custom.te
>@@ -5,8 +5,11 @@ require {
> type openvswitch_tmp_t;
> type ifconfig_exec_t;
> type hostname_exec_t;
>+type tun_tap_device_t;
> class netlink_socket { setopt getopt create connect getattr write 
> read };
> class file { write getattr read open execute execute_no_trans };
>+class chr_file { ioctl open read write };
>
>
>
>
>+class tun_socket { create };
> }
>
> #= openvswitch_t ==
>@@ -14,3 +17,5 @@ allow openvswitch_t self:netlink_socket { setopt getopt 
>create connect getattr w
> allow openvswitch_t hostname_exec_t:file { read getattr open execute 
> execute_no_trans };
> allow openvswitch_t ifconfig_exec_t:file { read getattr open execute 
> execute_no_trans };
> allow openvswitch_t openvswitch_tmp_t:file { execute execute_no_trans };
>+allow openvswitch_t self:tun_socket { create };
>+allow openvswitch_t tun_tap_device_t:chr_file { ioctl open read write };
>--
>2.11.0
>
>___
>dev mailing list
>d...@openvswitch.org
>https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] selinux: Allow creating tap devices.

2017-01-24 Thread Daniele Di Proietto
Current SELinux policy in RHEL and Fedora doesn't allow the creation of
TAP devices.

A tap device is used by dpif-netdev to create internal devices.

Without this patch, adding any bridge backed by the userspace datapath
would fail.

This doesn't mean that we can run Open vSwitch with DPDK under SELinux
yet, but at least we can use the userspace datapath.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 selinux/openvswitch-custom.te | 5 +
 1 file changed, 5 insertions(+)

diff --git a/selinux/openvswitch-custom.te b/selinux/openvswitch-custom.te
index 47ddb562c..98de89c98 100644
--- a/selinux/openvswitch-custom.te
+++ b/selinux/openvswitch-custom.te
@@ -5,8 +5,11 @@ require {
 type openvswitch_tmp_t;
 type ifconfig_exec_t;
 type hostname_exec_t;
+type tun_tap_device_t;
 class netlink_socket { setopt getopt create connect getattr write read 
};
 class file { write getattr read open execute execute_no_trans };
+class chr_file { ioctl open read write };
+class tun_socket { create };
 }
 
 #= openvswitch_t ==
@@ -14,3 +17,5 @@ allow openvswitch_t self:netlink_socket { setopt getopt 
create connect getattr w
 allow openvswitch_t hostname_exec_t:file { read getattr open execute 
execute_no_trans };
 allow openvswitch_t ifconfig_exec_t:file { read getattr open execute 
execute_no_trans };
 allow openvswitch_t openvswitch_tmp_t:file { execute execute_no_trans };
+allow openvswitch_t self:tun_socket { create };
+allow openvswitch_t tun_tap_device_t:chr_file { ioctl open read write };
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/3] rhel: Fix ifup and ifdown after DPDK naming change.

2017-01-24 Thread Daniele Di Proietto
Names like dpdk0 and dpdk1 are not enough to identify a DPDK interface.
We could update README.RHEL.rst and add

OVS_EXTRA='set Interface ${DEVICE} options:dpdk-devargs=:01:00.0'

but a better solution is to add new parameters in the configuration file
to explicitly specify the dpdk-devargs.

Fixes: 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 rhel/README.RHEL.rst| 13 +
 rhel/etc_sysconfig_network-scripts_ifup-ovs |  6 --
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/rhel/README.RHEL.rst b/rhel/README.RHEL.rst
index afccf1703..af4589325 100644
--- a/rhel/README.RHEL.rst
+++ b/rhel/README.RHEL.rst
@@ -266,14 +266,16 @@ DPDK NIC port:
 
 ::
 
-==> ifcfg-dpdk0 <==
-DPDK vhost-user port:
-DEVICE=dpdk0
+==> ifcfg-mydpdk0 <==
+DEVICE=mydpdk0
+DPDK_DEVARGS=":01:00.0"
 ONBOOT=yes
 DEVICETYPE=ovs
 TYPE=OVSDPDKPort
 OVS_BRIDGE=obr0
 
+DPDK vhost-user port:
+
 ::
 
 ==> ifcfg-vhu0 <==
@@ -283,6 +285,8 @@ DPDK NIC port:
 TYPE=OVSDPDKVhostUserPort
 OVS_BRIDGE=obr0
 
+DPDK bond:
+
 ::
 
 ==> ifcfg-bond0 <==
@@ -292,7 +296,8 @@ DPDK NIC port:
 TYPE=OVSDPDKBond
 OVS_BRIDGE=ovsbridge0
 BOOTPROTO=none
-BOND_IFACES="dpdk0 dpdk1"
+BOND_IFACES="mydpdk0 mydpdk1"
+BOND_DPDK_DEVARGS=":01:00.0 :06:00.0"
 OVS_OPTIONS="bond_mode=active-backup"
 HOTPLUG=no
 
diff --git a/rhel/etc_sysconfig_network-scripts_ifup-ovs 
b/rhel/etc_sysconfig_network-scripts_ifup-ovs
index e49e6fe71..8fe60fcb1 100755
--- a/rhel/etc_sysconfig_network-scripts_ifup-ovs
+++ b/rhel/etc_sysconfig_network-scripts_ifup-ovs
@@ -170,7 +170,7 @@ case "$TYPE" in
ovs-vsctl -t ${TIMEOUT} \
-- --if-exists del-port "$OVS_BRIDGE" "$DEVICE" \
-- add-port "$OVS_BRIDGE" "$DEVICE" $OVS_OPTIONS \
-   -- set Interface "$DEVICE" type=dpdk ${OVS_EXTRA+-- 
$OVS_EXTRA}
+   -- set Interface "$DEVICE" type=dpdk 
options:dpdk-devargs="${DPDK_DEVARGS}" ${OVS_EXTRA+-- $OVS_EXTRA}
;;
OVSDPDKRPort)
ifup_ovs_bridge
@@ -188,8 +188,10 @@ case "$TYPE" in
;;
OVSDPDKBond)
ifup_ovs_bridge
+   set -- ${BOND_DPDK_DEVARGS}
for _iface in $BOND_IFACES; do
-   IFACE_TYPES="${IFACE_TYPES} -- set interface ${_iface} 
type=dpdk"
+   IFACE_TYPES="${IFACE_TYPES} -- set interface ${_iface} 
type=dpdk options:dpdk-devargs=$1"
+   shift
done
ovs-vsctl -t ${TIMEOUT} \
-- --if-exists del-port "$OVS_BRIDGE" "$DEVICE" \
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/3] rhel: Remove obsolete OVSDPDKVhostPort from ifdown script.

2017-01-24 Thread Daniele Di Proietto
The support for vhost cuse port has been removed long ago.

Fixes:419876444357("netdev-dpdk: Remove dpdkvhostcuse ports")
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 rhel/etc_sysconfig_network-scripts_ifdown-ovs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rhel/etc_sysconfig_network-scripts_ifdown-ovs 
b/rhel/etc_sysconfig_network-scripts_ifdown-ovs
index 39884016c..8c9f3694c 100755
--- a/rhel/etc_sysconfig_network-scripts_ifdown-ovs
+++ b/rhel/etc_sysconfig_network-scripts_ifdown-ovs
@@ -59,7 +59,7 @@ case "$TYPE" in
OVSPatchPort|OVSTunnel)
ovs-vsctl -t ${TIMEOUT} -- --if-exists del-port "$OVS_BRIDGE" 
"$DEVICE"
;;
-   
OVSDPDKPort|OVSDPDKRPort|OVSDPDKVhostPort|OVSDPDKVhostUserPort|OVSDPDKBond)
+   OVSDPDKPort|OVSDPDKRPort|OVSDPDKVhostUserPort|OVSDPDKBond)
ovs-vsctl -t ${TIMEOUT} -- --if-exists del-port "$OVS_BRIDGE" 
"$DEVICE"
;;
*)
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/1] dpif-netdev: Conditional EMC insert

2017-01-23 Thread Daniele Di Proietto
2017-01-22 11:45 GMT-08:00 Jan Scheurich :
>
>> It's not a big deal, since the most important use case we have for
>> dpif-netdev is with dpdk, but I'd still like the code to behave
>> similarly on different platforms.  How about defining a function that
>> uses random_uint32 when compiling without DPDK?
>>
>> For testing it's not that simple, because unit tests can be run with
>> or without DPDK.  It would need to be configurable at runtime.
>> Perhaps making EM_FLOW_INSERT_PROB configurable at runtime would also
>> help people that want to experiment with different values, even
>> though, based on the comments, I guess they wouldn't really see much
>> difference.
>>
>> Again, what do you think about simply using counting the packets and
>> inserting only 1 every EM_FLOW_INSERT_PROB?
>>
>> Thanks,
>>
>> Daniele
>
>
> As far as I know Ciara did some quick tests with a counter-based
> implementation and it performed 5% worse for 1K and 4K flows than then
> current patch. Perhaps we could find the reason for that and fix it, but I
> also feel uncomfortable with deterministic insertion of every Nth flow. This
> could lead to very strange lock-step phenomena with typical artificial test
> work loads, which often generate flows round-robin. I would rather use a
> random function, as you suggest, or count "cycles" differently when
> compiling without DPDK.

Ok, using another pseudo random function when compiling without DPDK sounds
good to me.

>
> I agree to making the parameter EM_FLOW_INSERT_PROB configurable for unit
> test or other purposes. Should it be a new option in the OpenvSwitch table
> in OVSDB or rather a run-time parameter to be changed with ovs-appctl?

I think a new option in Openvswitch other_config would be appropriate.

Thanks,

Daniele

>
> Jan
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles

2017-01-20 Thread Daniele Di Proietto
2017-01-20 5:59 GMT-08:00 Jan Scheurich <jan.scheur...@web.de>:
>
>
> On 2017-01-18 17:32, Kevin Traynor wrote:
>>
>> On 01/18/2017 01:34 AM, Daniele Di Proietto wrote:
>>>
>>> 2017-01-17 11:43 GMT-08:00 Kevin Traynor <ktray...@redhat.com>:
>>>>
>>>> On 01/17/2017 05:43 PM, Ciara Loftus wrote:
>>>>>
>>>>> Instead of counting all polling cycles as processing cycles, only count
>>>>> the cycles where packets were received from the polling.
>>>>
>>>> This makes these stats much clearer. One minor comment below, other than
>>>> that
>>>>
>>>> Acked-by: Kevin Traynor <ktray...@redhat.com>
>>>>
>>>>> Signed-off-by: Georg Schmuecking <georg.schmueck...@ericsson.com>
>>>>> Signed-off-by: Ciara Loftus <ciara.lof...@intel.com>
>>>>> Co-authored-by: Ciara Loftus <ciara.lof...@intel.com>
>>>
>>> Minor: the co-authored-by tag should be different from the main author.
>>>
>>> This makes it easier to understand how busy a pmd thread is, a valid
>>> question
>>> that a sysadmin might have.
>>>
>>> The counters were originally introduced to help developers understand how
>>> cycles
>>> are spent between drivers(netdev rx) and datapath processing(dpif).
>>> Do you think
>>> it's ok to lose this type of information?  Perhaps it is, since a
>>> developer can also
>>> use a profiler, I'm not sure.
>>>
>>> Maybe we could 'last_cycles' as it is and introduce a separate counter to
>>> get
>>> the idle/busy ratio.  I'm not 100% sure this is the best way.
>>>
>>> What do you guys think?
>>>
>> I've only ever used the current stats for trying to estimate if polling
>> was getting packets or not, so the addition of an idle stat helps that.
>> I like your suggestion of having all three stats, so then it would be
>> something like:
>>
>> polling unsuccessful (idle)
>> polling successful (got pkts)
>> processing pkts
>>
>> That would keep the info for a developer and it could help initial debug
>> if pkt rates drop on a pmd.
>>
>> Kevin.
>
>
> From an operational perspective, the most important data is clearly the
> fraction of busy cycles. Any additional breakdown of busy cycles is
> debatable. We have always been wondering why Rx cost was accounted for
> separately in the current code, while Tx cost was included in the
> processing. That didn't make much sense to us.
>
> A developer should be able to split the busy cycles between Rx polling,
> processing (parsing, EMC lookup, dplcs lookup, upcall(!), actions) and Tx to
> port by analysing "perf top" output, as we have done in the analysis for our
> performance patches, or using a fancier profiler.

Thanks Kevin and Jan, based on the above discussion I think we can remove
the distinction between successfully polling and processing, meaning that the
patch is good.

Since we want to expose this the user rather than the developer, I think that
the documentation in vswitchd/ovs-vswitchd.8.in should explain the meaning
of idle cycles and processing cycles.

>
> One additional metric that would be interesting to see in pmd_stats_show,
> however, is the average number of packets per batch polled from a port (or
> recirculated).

Good point.  Maybe we can address this in a separate patch?

>
> Regards, Jan
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Update DPDK doc after port naming change.

2017-01-19 Thread Daniele Di Proietto





On 19/01/2017 03:12, "Loftus, Ciara" <ciara.lof...@intel.com> wrote:

>> 
>> options:dpdk-devargs is always required now.  This commit also changes
>> some of the names from 'dpdk0' to various others.
>> 
>> netdev-dpdk/detach accepts a PCI id instead of a port name.
>> 
>> CC: Ciara Loftus <ciara.lof...@intel.com>
>> Fixes: 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming")
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>Patch looks good. Thanks for the fixes!
>
>Acked-by: Ciara Loftus <ciara.lof...@intel.com>

Thanks! Pushed to master

>
>> ---
>>  Documentation/howto/dpdk.rst| 77 
>> -
>>  Documentation/howto/userspace-tunneling.rst |  2 +-
>>  2 files changed, 43 insertions(+), 36 deletions(-)
>> 
>> diff --git a/Documentation/howto/dpdk.rst
>> b/Documentation/howto/dpdk.rst
>> index fbb4b5361..d1e6e899f 100644
>> --- a/Documentation/howto/dpdk.rst
>> +++ b/Documentation/howto/dpdk.rst
>> @@ -44,8 +44,10 @@ ovs-vsctl can also be used to add DPDK devices. OVS
>> expects DPDK device names
>>  to start with ``dpdk`` and end with a portid. ovs-vswitchd should print the
>>  number of dpdk devices found in the log file::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>> -$ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0
>> +$ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
>> +options:dpdk-devargs=:01:00.1
>> 
>>  After the DPDK ports get added to switch, a polling thread continuously 
>> polls
>>  DPDK devices and consumes 100% of the core, as can be checked from
>> ``top`` and
>> @@ -55,12 +57,12 @@ DPDK devices and consumes 100% of the core, as can
>> be checked from ``top`` and
>>  $ ps -eLo pid,psr,comm | grep pmd
>> 
>>  Creating bonds of DPDK interfaces is slightly different to creating bonds of
>> -system interfaces. For DPDK, the interface type must be explicitly set. For
>> -example::
>> +system interfaces. For DPDK, the interface type and devargs must be
>> explicitly
>> +set. For example::
>> 
>> -$ ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 \
>> --- set Interface dpdk0 type=dpdk \
>> --- set Interface dpdk1 type=dpdk
>> +$ ovs-vsctl add-bond br0 dpdkbond p0 p1 \
>> +-- set Interface p0 type=dpdk options:dpdk-devargs=:01:00.0 \
>> +-- set Interface p1 type=dpdk options:dpdk-devargs=:01:00.1
>> 
>>  To stop ovs-vswitchd & delete bridge, run::
>> 
>> @@ -98,7 +100,7 @@ where:
>> 
>>  For example::
>> 
>> -$ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>> +$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
>>  other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>> 
>>  This will ensure:
>> @@ -165,27 +167,27 @@ Flow Control
>>  Flow control can be enabled only on DPDK physical ports. To enable flow
>> control
>>  support at tx side while adding a port, run::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- \
>> -set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0 options:tx-flow-ctrl=true
>> 
>>  Similarly, to enable rx flow control, run::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- \
>> -set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0 options:rx-flow-ctrl=true
>> 
>>  To enable flow control auto-negotiation, run::
>> 
>> -$ ovs-vsctl add-port br0 dpdk0 -- \
>> -set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true
>> +$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
>> +options:dpdk-devargs=:01:00.0 options:flow-ctrl-autoneg=true
>> 
>>  To turn ON the tx flow control at run time for an existing port, run::
>> 
>> -$ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true
>> +$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=true
>> 
>>  The flow control parameters can be turned off by setting ``false`` to the
>>  respective parameter. To disable the flow control at tx side, run::
&

Re: [ovs-dev] [PATCH] configuration.rst: Update the example of DPDK port's configuration

2017-01-18 Thread Daniele Di Proietto
2017-01-18 15:18 GMT-08:00 Daniele Di Proietto <diproiet...@ovn.org>:
> 2017-01-18 11:55 GMT-08:00 Binbin Xu <xu.binb...@zte.com.cn>:
>> After the hotplug of DPDK ports, a valid dpdk-devargs must be
>> specified. Otherwise, the DPDK device can't be available.
>>
>> Signed-off-by: Binbin Xu <xu.binb...@zte.com.cn>
>
> Thanks! Applied to master and branch-2.7

I realized that we forgot to update the documentation in other places,
so I sent a patch here:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327782.html

>
>> ---
>>  Documentation/faq/configuration.rst | 7 +++
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/faq/configuration.rst 
>> b/Documentation/faq/configuration.rst
>> index c03d069..8bd0e11 100644
>> --- a/Documentation/faq/configuration.rst
>> +++ b/Documentation/faq/configuration.rst
>> @@ -107,12 +107,11 @@ Q: How do I configure a DPDK port as an access port?
>>  startup when other_config:dpdk-init is set to 'true'.
>>
>>  Secondly, when adding a DPDK port, unlike a system port, the type for 
>> the
>> -interface must be specified. For example::
>> +interface and valid dpdk-devargs must be specified. For example::
>>
>>  $ ovs-vsctl add-br br0
>> -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>> -
>> -Finally, it is required that DPDK port names begin with ``dpdk``.
>> +$ ovs-vsctl add-port br0 myportname -- set Interface myportname \
>> +type=dpdk options:dpdk-devargs=:06:00.0
>>
>>  Refer to :doc:`/intro/install/dpdk` for more information on enabling and
>>  using DPDK with Open vSwitch.
>> --
>> 1.8.3.1
>>
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] Documentation: Update DPDK doc after port naming change.

2017-01-18 Thread Daniele Di Proietto
options:dpdk-devargs is always required now.  This commit also changes
some of the names from 'dpdk0' to various others.

netdev-dpdk/detach accepts a PCI id instead of a port name.

CC: Ciara Loftus <ciara.lof...@intel.com>
Fixes: 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming")
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 Documentation/howto/dpdk.rst| 77 -
 Documentation/howto/userspace-tunneling.rst |  2 +-
 2 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index fbb4b5361..d1e6e899f 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -44,8 +44,10 @@ ovs-vsctl can also be used to add DPDK devices. OVS expects 
DPDK device names
 to start with ``dpdk`` and end with a portid. ovs-vswitchd should print the
 number of dpdk devices found in the log file::
 
-$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
-$ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0
+$ ovs-vsctl add-port br0 dpdk-p1 -- set Interface dpdk-p1 type=dpdk \
+options:dpdk-devargs=:01:00.1
 
 After the DPDK ports get added to switch, a polling thread continuously polls
 DPDK devices and consumes 100% of the core, as can be checked from ``top`` and
@@ -55,12 +57,12 @@ DPDK devices and consumes 100% of the core, as can be 
checked from ``top`` and
 $ ps -eLo pid,psr,comm | grep pmd
 
 Creating bonds of DPDK interfaces is slightly different to creating bonds of
-system interfaces. For DPDK, the interface type must be explicitly set. For
-example::
+system interfaces. For DPDK, the interface type and devargs must be explicitly
+set. For example::
 
-$ ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 \
--- set Interface dpdk0 type=dpdk \
--- set Interface dpdk1 type=dpdk
+$ ovs-vsctl add-bond br0 dpdkbond p0 p1 \
+-- set Interface p0 type=dpdk options:dpdk-devargs=:01:00.0 \
+-- set Interface p1 type=dpdk options:dpdk-devargs=:01:00.1
 
 To stop ovs-vswitchd & delete bridge, run::
 
@@ -98,7 +100,7 @@ where:
 
 For example::
 
-$ ovs-vsctl set interface dpdk0 options:n_rxq=4 \
+$ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \
 other_config:pmd-rxq-affinity="0:3,1:7,3:8"
 
 This will ensure:
@@ -165,27 +167,27 @@ Flow Control
 Flow control can be enabled only on DPDK physical ports. To enable flow control
 support at tx side while adding a port, run::
 
-$ ovs-vsctl add-port br0 dpdk0 -- \
-set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0 options:tx-flow-ctrl=true
 
 Similarly, to enable rx flow control, run::
 
-$ ovs-vsctl add-port br0 dpdk0 -- \
-set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0 options:rx-flow-ctrl=true
 
 To enable flow control auto-negotiation, run::
 
-$ ovs-vsctl add-port br0 dpdk0 -- \
-set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+options:dpdk-devargs=:01:00.0 options:flow-ctrl-autoneg=true
 
 To turn ON the tx flow control at run time for an existing port, run::
 
-$ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=true
+$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=true
 
 The flow control parameters can be turned off by setting ``false`` to the
 respective parameter. To disable the flow control at tx side, run::
 
-$ ovs-vsctl set Interface dpdk0 options:tx-flow-ctrl=false
+$ ovs-vsctl set Interface dpdk-p0 options:tx-flow-ctrl=false
 
 pdump
 -
@@ -234,13 +236,12 @@ enable Jumbo Frames support for a DPDK port, change the 
Interface's
 ``mtu_request`` attribute to a sufficiently large value. For example, to add a
 DPDK Phy port with MTU of 9000::
 
-$ ovs-vsctl add-port br0 dpdk0 \
-  -- set Interface dpdk0 type=dpdk \
-  -- set Interface dpdk0 mtu_request=9000`
+$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk \
+  options:dpdk-devargs=:01:00.0 mtu_request=9000
 
 Similarly, to change the MTU of an existing port to 6200::
 
-$ ovs-vsctl set Interface dpdk0 mtu_request=6200
+$ ovs-vsctl set Interface dpdk-p0 mtu_request=6200
 
 Some additional configuration is needed to take advantage of jumbo frames with
 vHost ports:
@@ -280,14 +281,14 @@ By default, DPDK physical ports are enabled with Rx 
checksum offload. Rx
 checksum offload can be configured on a DPDK physical port either when adding
 or at run time.
 

Re: [ovs-dev] [PATCH] configuration.rst: Update the example of DPDK port's configuration

2017-01-18 Thread Daniele Di Proietto
2017-01-18 11:55 GMT-08:00 Binbin Xu :
> After the hotplug of DPDK ports, a valid dpdk-devargs must be
> specified. Otherwise, the DPDK device can't be available.
>
> Signed-off-by: Binbin Xu 

Thanks! Applied to master and branch-2.7

> ---
>  Documentation/faq/configuration.rst | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/faq/configuration.rst 
> b/Documentation/faq/configuration.rst
> index c03d069..8bd0e11 100644
> --- a/Documentation/faq/configuration.rst
> +++ b/Documentation/faq/configuration.rst
> @@ -107,12 +107,11 @@ Q: How do I configure a DPDK port as an access port?
>  startup when other_config:dpdk-init is set to 'true'.
>
>  Secondly, when adding a DPDK port, unlike a system port, the type for the
> -interface must be specified. For example::
> +interface and valid dpdk-devargs must be specified. For example::
>
>  $ ovs-vsctl add-br br0
> -$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
> -
> -Finally, it is required that DPDK port names begin with ``dpdk``.
> +$ ovs-vsctl add-port br0 myportname -- set Interface myportname \
> +type=dpdk options:dpdk-devargs=:06:00.0
>
>  Refer to :doc:`/intro/install/dpdk` for more information on enabling and
>  using DPDK with Open vSwitch.
> --
> 1.8.3.1
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif-netdev: Change definitions of 'idle' & 'processing' cycles

2017-01-17 Thread Daniele Di Proietto
2017-01-17 11:43 GMT-08:00 Kevin Traynor :
> On 01/17/2017 05:43 PM, Ciara Loftus wrote:
>> Instead of counting all polling cycles as processing cycles, only count
>> the cycles where packets were received from the polling.
>
> This makes these stats much clearer. One minor comment below, other than
> that
>
> Acked-by: Kevin Traynor 
>
>>
>> Signed-off-by: Georg Schmuecking 
>> Signed-off-by: Ciara Loftus 
>> Co-authored-by: Ciara Loftus 

Minor: the co-authored-by tag should be different from the main author.

This makes it easier to understand how busy a pmd thread is, a valid question
that a sysadmin might have.

The counters were originally introduced to help developers understand how cycles
are spent between drivers(netdev rx) and datapath processing(dpif).
Do you think
it's ok to lose this type of information?  Perhaps it is, since a
developer can also
use a profiler, I'm not sure.

Maybe we could 'last_cycles' as it is and introduce a separate counter to get
the idle/busy ratio.  I'm not 100% sure this is the best way.

What do you guys think?

Thanks,

Daniele

>> ---
>> v2:
>> - Rebase
>> ---
>>  lib/dpif-netdev.c | 57 
>> ++-
>>  1 file changed, 44 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 3901129..3854c79 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -272,7 +272,10 @@ enum dp_stat_type {
>>
>>  enum pmd_cycles_counter_type {
>>  PMD_CYCLES_POLLING, /* Cycles spent polling NICs. */
>
> this is not used anymore and can be removed
>
>> -PMD_CYCLES_PROCESSING,  /* Cycles spent processing packets */
>> +PMD_CYCLES_IDLE,/* Cycles spent idle or unsuccessful 
>> polling */
>> +PMD_CYCLES_PROCESSING,  /* Cycles spent successfully polling and
>> + * processing polled packets */
>> +
>>  PMD_N_CYCLES
>>  };
>>
>> @@ -747,10 +750,10 @@ pmd_info_show_stats(struct ds *reply,
>>  }
>>
>>  ds_put_format(reply,
>> -  "\tpolling cycles:%"PRIu64" (%.02f%%)\n"
>> +  "\tidle cycles:%"PRIu64" (%.02f%%)\n"
>>"\tprocessing cycles:%"PRIu64" (%.02f%%)\n",
>> -  cycles[PMD_CYCLES_POLLING],
>> -  cycles[PMD_CYCLES_POLLING] / (double)total_cycles * 100,
>> +  cycles[PMD_CYCLES_IDLE],
>> +  cycles[PMD_CYCLES_IDLE] / (double)total_cycles * 100,
>>cycles[PMD_CYCLES_PROCESSING],
>>cycles[PMD_CYCLES_PROCESSING] / (double)total_cycles * 
>> 100);
>>
>> @@ -2892,30 +2895,43 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>>  non_atomic_ullong_add(>cycles.n[type], interval);
>>  }
>>
>> -static void
>> +/* Calculate the intermediate cycle result and add to the counter 'type' */
>> +static inline void
>> +cycles_count_intermediate(struct dp_netdev_pmd_thread *pmd,
>> +  enum pmd_cycles_counter_type type)

I'd add an OVS_REQUIRES(_counter_fake_mutex)

>> +OVS_NO_THREAD_SAFETY_ANALYSIS
>> +{
>> +unsigned long long new_cycles = cycles_counter();
>> +unsigned long long interval = new_cycles - pmd->last_cycles;
>> +pmd->last_cycles = new_cycles;
>> +
>> +non_atomic_ullong_add(>cycles.n[type], interval);
>> +}
>> +
>> +static int
>>  dp_netdev_process_rxq_port(struct dp_netdev_pmd_thread *pmd,
>> struct netdev_rxq *rx,
>> odp_port_t port_no)
>>  {
>>  struct dp_packet_batch batch;
>>  int error;
>> +int batch_cnt = 0;
>>
>>  dp_packet_batch_init();
>> -cycles_count_start(pmd);
>>  error = netdev_rxq_recv(rx, );
>> -cycles_count_end(pmd, PMD_CYCLES_POLLING);
>>  if (!error) {
>>  *recirc_depth_get() = 0;
>>
>> -cycles_count_start(pmd);
>> +batch_cnt = batch.count;
>>  dp_netdev_input(pmd, , port_no);
>> -cycles_count_end(pmd, PMD_CYCLES_PROCESSING);
>>  } else if (error != EAGAIN && error != EOPNOTSUPP) {
>>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>>
>>  VLOG_ERR_RL(, "error receiving data from %s: %s",
>>  netdev_rxq_get_name(rx), ovs_strerror(error));
>>  }
>> +
>> +return batch_cnt;
>>  }
>>
>>  static struct tx_port *
>> @@ -3377,21 +3393,27 @@ dpif_netdev_run(struct dpif *dpif)
>>  struct dp_netdev *dp = get_dp_netdev(dpif);
>>  struct dp_netdev_pmd_thread *non_pmd;
>>  uint64_t new_tnl_seq;
>> +int process_packets = 0;
>>
>>  ovs_mutex_lock(>port_mutex);
>>  non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
>>  if (non_pmd) {
>>  ovs_mutex_lock(>non_pmd_mutex);
>> +cycles_count_start(non_pmd);
>>  HMAP_FOR_EACH (port, node, >ports) {
>>

Re: [ovs-dev] [PATCH] dpif-netdev: Avoids repeated addition of DP_STAT_LOST.

2017-01-16 Thread Daniele Di Proietto





On 16/01/2017 09:31, "Ben Pfaff" <b...@ovn.org> wrote:

>On Mon, Jan 16, 2017 at 04:56:39AM -0800, nickcooper-zhangtonghao wrote:
>> Signed-off-by: nickcooper-zhangtonghao <n...@opencloud.tech>
>> ---
>>  lib/dpif-netdev.c | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 08167b5..3901129 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -4258,7 +4258,6 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
>>  ofpbuf_uninit();
>>  ofpbuf_uninit(_actions);
>>  fat_rwlock_unlock(>upcall_rwlock);
>> -dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
>>  } else if (OVS_UNLIKELY(any_miss)) {
>>  for (i = 0; i < cnt; i++) {
>>          if (OVS_UNLIKELY(!rules[i])) {
>
>Acked-by: Ben Pfaff <b...@ovn.org>
>
>I believe that this also should be tagged:
>
>CC: Daniele Di Proietto <diproiet...@vmware.com>
>Fixes: 8aaa125dab66 ("dpif-netdev: Share emc and fast path output batches.")
>
>Since this dates to May 2015 and DPDK isn't really my area, I'll leave
>this to Daniele for final application.

LGTM as well, I added the tags and applied this to master, branch-2.6 and 
branch-2.5

Thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-vport: Do not log empty warnings on success.

2017-01-12 Thread Daniele Di Proietto





On 12/01/2017 09:33, "Ben Pfaff" <b...@ovn.org> wrote:

>On Thu, Jan 12, 2017 at 12:23:55AM -0800, Daniele Di Proietto wrote:
>> set_tunnel_config() always logs a warning, even on success. This
>> shouldn't happen.
>> 
>> Without this, some unit tests fail.
>> 
>> Fixes: 9fff138ec3a6("netdev: Add 'errp' to set_config().")
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> ---
>>  lib/netdev-vport.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>> 
>> diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
>> index ad5ffcc81..2db51df72 100644
>> --- a/lib/netdev-vport.c
>> +++ b/lib/netdev-vport.c
>> @@ -561,10 +561,12 @@ set_tunnel_config(struct netdev *dev_, const struct 
>> smap *args, char **errp)
>>  err = 0;
>>  
>>  out:
>> -ds_chomp(, '\n');
>> -VLOG_WARN("%s", ds_cstr());
>> -if (err) {
>> -*errp = ds_steal_cstr();
>> +if (*ds_cstr()) {
>
>How about "if (errors.length)" instead?

Ok

>
>> +ds_chomp(, '\n');
>> +VLOG_WARN("%s", ds_cstr());
>> +if (err) {
>> +*errp = ds_steal_cstr();
>> +}
>>  }
>>  
>>  ds_destroy();
>
>Acked-by: Ben Pfaff <b...@ovn.org>

Thanks, pushed to master
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] broken tests

2017-01-12 Thread Daniele Di Proietto





On 12/01/2017 09:24, "Ben Pfaff"  wrote:

>Commit 9fff138ec3a6dbe75073d16cba7fbe86ac273c36 "netdev: Add 'errp' to
>set_config()." breaks the unit tests because netdev-vport now logs lots
>of blank lines.  I am unsure of the right fix--is it to just drop the
>new VLOG_WARN call?

Hi Ben,

Sorry about that, I posted a patch that should fix this:

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327564.html

If you're fine with it, I'll merge it shortly.

>
>Thanks,
>
>Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev-vport: Do not log empty warnings on success.

2017-01-12 Thread Daniele Di Proietto
set_tunnel_config() always logs a warning, even on success. This
shouldn't happen.

Without this, some unit tests fail.

Fixes: 9fff138ec3a6("netdev: Add 'errp' to set_config().")
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-vport.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index ad5ffcc81..2db51df72 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -561,10 +561,12 @@ set_tunnel_config(struct netdev *dev_, const struct smap 
*args, char **errp)
 err = 0;
 
 out:
-ds_chomp(, '\n');
-VLOG_WARN("%s", ds_cstr());
-if (err) {
-*errp = ds_steal_cstr();
+if (*ds_cstr()) {
+ds_chomp(, '\n');
+VLOG_WARN("%s", ds_cstr());
+if (err) {
+*errp = ds_steal_cstr();
+}
 }
 
 ds_destroy();
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Assign socket id according to device's numa id

2017-01-11 Thread Daniele Di Proietto
2017-01-12 6:18 GMT-08:00 Binbin Xu :
> We can hotplug attach DPDK ports specified via the 'dpdk-devargs'
> option now.
>
> But the socket id of DPDK ports can't be assigned correctly,
> it is always 0. The socket id of DPDK ports should be assigned
> according to the numa id of the device.
>
> Fixes: 55e075e65ef9e ("netdev-dpdk: Arbitrary 'dpdk' port naming")
> Signed-off-by: Binbin Xu 

Thanks a lot for fixing this, applied to master

> ---
>  lib/netdev-dpdk.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 8bb9086..57ebdb3 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1197,6 +1197,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> struct smap *args)
>  bool temp_flag;
>  const char *new_devargs;
>  int err = 0;
> +int sid;
>
>  ovs_mutex_lock(_mutex);
>  ovs_mutex_lock(>mutex);
> @@ -1242,6 +1243,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
> struct smap *args)
>  err = EADDRINUSE;
>  } else {
>  dev->devargs = xstrdup(new_devargs);
> +sid = rte_eth_dev_socket_id(new_port_id);
> +dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
>  dev->port_id = new_port_id;
>  netdev_request_reconfigure(>up);
>  err = 0;
> @@ -3140,7 +3143,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>  && netdev->n_rxq == dev->requested_n_rxq
>  && dev->mtu == dev->requested_mtu
>  && dev->rxq_size == dev->requested_rxq_size
> -&& dev->txq_size == dev->requested_txq_size) {
> +&& dev->txq_size == dev->requested_txq_size
> +&& dev->socket_id == dev->requested_socket_id) {
>  /* Reconfiguration is unnecessary */
>
>  goto out;
> @@ -3148,7 +3152,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>
>  rte_eth_dev_stop(dev->port_id);
>
> -if (dev->mtu != dev->requested_mtu) {
> +if (dev->mtu != dev->requested_mtu
> +|| dev->socket_id != dev->requested_socket_id) {
>  netdev_dpdk_mempool_configure(dev);
>  }
>
> --
> 1.8.3.1
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dummy: Limits the number of tx/rx queues.

2017-01-10 Thread Daniele Di Proietto
2017-01-09 21:56 GMT-08:00 nickcooper-zhangtonghao :
> This patch avoids the ovs_rcu to report WARN, caused by blocked
> for a long time, when ovs-vswitchd processes a port with many
> rx/tx queues. The number of tx/rx queues per port may be appropriate,
> because the dpdk uses it as an default max value.
>
> Signed-off-by: nickcooper-zhangtonghao 

Applied to master, thanks

> ---
> v3:
> * Limits the number of tx/rx queues in set_config().
> * Adds the WARN log when exceeds DUMMY_MAX_QUEUES_PER_PORT.
> ---
>  lib/netdev-dummy.c | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index bdb77e1..4a23cba 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -827,6 +827,8 @@ netdev_dummy_set_in6(struct netdev *netdev_, struct 
> in6_addr *in6,
>  return 0;
>  }
>
> +#define DUMMY_MAX_QUEUES_PER_PORT 1024
> +
>  static int
>  netdev_dummy_set_config(struct netdev *netdev_, const struct smap *args)
>  {
> @@ -870,6 +872,21 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
> struct smap *args)
>
>  new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
>  new_n_txq = MAX(smap_get_int(args, "n_txq", NR_QUEUE), 1);
> +
> +if (new_n_rxq > DUMMY_MAX_QUEUES_PER_PORT ||
> +new_n_txq > DUMMY_MAX_QUEUES_PER_PORT) {
> +VLOG_WARN("The one or both of interface %s queues"
> +  "(rxq: %d, txq: %d) exceed %d. Sets it %d.\n",
> +  netdev->up.name,
> +  new_n_rxq,
> +  new_n_txq,
> +  DUMMY_MAX_QUEUES_PER_PORT,
> +  DUMMY_MAX_QUEUES_PER_PORT);
> +
> +new_n_rxq = MIN(DUMMY_MAX_QUEUES_PER_PORT, new_n_rxq);
> +new_n_txq = MIN(DUMMY_MAX_QUEUES_PER_PORT, new_n_txq);
> +}
> +
>  new_numa_id = smap_get_int(args, "numa_id", 0);
>  if (new_n_rxq != netdev->requested_n_rxq
>  || new_n_txq != netdev->requested_n_txq
> --
> 1.8.3.1
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] dpdk: Late initialization.

2017-01-09 Thread Daniele Di Proietto
With this commit, we allow the user to set other_config:dpdk-init=true
after the process is started.  This makes it easier to start Open
vSwitch with DPDK using standard init scripts without restarting the
service.

This is still far from ideal, because initializing DPDK might still
abort the process (e.g. if there not enough memory), so the user must
check the status of the process after setting dpdk-init to true.

Nonetheless, I think this is an improvement, because it doesn't require
restarting the whole unit.

CC: Aaron Conole <acon...@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
v3: Set 'enable' after dpdk_init__()
---
 lib/dpdk-stub.c |  8 
 lib/dpdk.c  | 31 +--
 tests/ofproto-macros.at |  2 +-
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index bd981bb90..daef7291f 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -27,13 +27,13 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovsthread_once_start()) {
-if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+if (ovsthread_once_start()) {
 VLOG_ERR("DPDK not supported in this copy of Open vSwitch.");
+ovsthread_once_done();
 }
-ovsthread_once_done();
 }
 }
 
diff --git a/lib/dpdk.c b/lib/dpdk.c
index ee4360b22..9ae249141 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -273,12 +273,6 @@ dpdk_init__(const struct smap *ovs_other_config)
 cpu_set_t cpuset;
 char *sock_dir_subcomponent;
 
-if (!smap_get_bool(ovs_other_config, "dpdk-init", false)) {
-VLOG_INFO("DPDK Disabled - to change this requires a restart.\n");
-return;
-}
-
-VLOG_INFO("DPDK Enabled, initializing");
 if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
 NAME_MAX, ovs_other_config,
 _dir_subcomponent)) {
@@ -413,11 +407,28 @@ dpdk_init__(const struct smap *ovs_other_config)
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+static bool enabled = false;
+
+if (enabled || !ovs_other_config) {
+return;
+}
+
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovs_other_config && ovsthread_once_start()) {
-dpdk_init__(ovs_other_config);
-ovsthread_once_done();
+if (ovsthread_once_start(_enable)) {
+VLOG_INFO("DPDK Enabled - initializing...");
+dpdk_init__(ovs_other_config);
+enabled = true;
+VLOG_INFO("DPDK Enabled - initialized");
+ovsthread_once_done(_enable);
+}
+} else {
+static struct ovsthread_once once_disable = OVSTHREAD_ONCE_INITIALIZER;
+if (ovsthread_once_start(_disable)) {
+VLOG_INFO("DPDK Disabled - Use other_config:dpdk-init to enable");
+ovsthread_once_done(_disable);
+}
 }
 }
 
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 547b8..faff5b0a8 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -331,7 +331,7 @@ m4_define([_OVS_VSWITCHD_START],
 /ofproto|INFO|using datapath ID/d
 /netdev_linux|INFO|.*device has unknown hardware address family/d
 /ofproto|INFO|datapath ID changed to fedcba9876543210/d
-/dpdk|INFO|DPDK Disabled - to change this requires a restart./d']])
+/dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d']])
 ])
 
 # OVS_VSWITCHD_START([vsctl-args], [vsctl-output], [=override],
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] datapath: Limits the number of tx/rx queues for netdev-dummy.

2017-01-09 Thread Daniele Di Proietto
2017-01-08 20:02 GMT-08:00 nickcooper-zhangtonghao <n...@opencloud.tech>:
> Thanks Daniele,
> Yes, it’s a small improvement. but it is necessary for us. I will check it
> in
> set_config(). One question to ask: should we check the tx/rx queue for
> netdev-dpdk in set_config()?

I think for DPDK devices ultimately there's no way to check without
actually setting up the queues, that's why it's done in reconfigure().

Thanks,

Daniele

>
> Now we check it in dpdk_eth_dev_init().
>
> Thanks.
>
>
>
> On Jan 9, 2017, at 11:22 AM, Daniele Di Proietto <diproiet...@ovn.org>
> wrote:
>
> 2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao <n...@opencloud.tech>:
>
> This patch avoids the ovs_rcu to report WARN, caused by blocked
> for a long time, when ovs-vswitchd processes a port with many
> rx/tx queues. The number of tx/rx queues per port may be appropriate,
> because the dpdk uses it as an default max value.
>
> Signed-off-by: nickcooper-zhangtonghao <n...@opencloud.tech>
>
>
> I don't think this is a big deal, since netdev-dummy is only used for
> testing, but don't you think it's better to check it in set_config()
> and return an error?
>
> Also, could you use the prefix netdev-dummy, instead of datapath?
>
> Thanks,
>
> Daniele
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpdk: Late initialization.

2017-01-09 Thread Daniele Di Proietto





On 09/01/2017 03:49, "nickcooper-zhangtonghao" <n...@opencloud.tech> wrote:

>
>
>
>hi Daniele,
>I reviewed this patch. One question to ask: should we check the
>hugepage mm before calling the rte_eal_init()? improvement on next version?

How do you suggest to check for hugepage before calling rte_eal_init()?

I think everybody agrees that in the long term we need to avoid aborting if the 
initialization fails, but most of that work need to happen in dpdk library.

If there's a simple check we could do here, I'm fine with including that, if 
it's something more complicated and needs to be a separate patch, we should 
probably defer it, since we're on feature freeze now.

Thanks,

Daniele

>
>
>
>Thanks.
>Nick
>
>
>
>On Jan 9, 2017, at 11:21 AM, Daniele Di Proietto <diproiet...@vmware.com> 
>wrote:
>
>With
> this commit, we allow the user to set other_config:dpdk-init=true
>after
> the process is started.  This makes it easier to start Open
>vSwitch
> with DPDK using standard init scripts without restarting the
>service.
>
>This
> is still far from ideal, because initializing DPDK might still
>abort
> the process (e.g. if there not enough memory), so the user must
>check
> the status of the process after setting dpdk-init to true.
>
>Nonetheless,
> I think this is an improvement, because it doesn't require
>restarting
> the whole unit.
>
>CC:
> Aaron Conole <acon...@redhat.com>
>Signed-off-by:
> Daniele Di Proietto <diproiet...@vmware.com>
>---
>v1->v2:
> No change, first non-RFC post.
>---
>lib/dpdk-stub.c
> |  8 
>lib/dpdk.c
>  | 30 --
>tests/ofproto-macros.at <http://ofproto-macros.at/> |
>  2 +-
>3
> files changed, 25 insertions(+), 15 deletions(-)
>
>
>
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpdk: Late initialization.

2017-01-09 Thread Daniele Di Proietto





On 09/01/2017 07:14, "Aaron Conole" <acon...@redhat.com> wrote:

>Daniele Di Proietto <diproiet...@vmware.com> writes:
>
>> With this commit, we allow the user to set other_config:dpdk-init=true
>> after the process is started.  This makes it easier to start Open
>> vSwitch with DPDK using standard init scripts without restarting the
>> service.
>>
>> This is still far from ideal, because initializing DPDK might still
>> abort the process (e.g. if there not enough memory), so the user must
>> check the status of the process after setting dpdk-init to true.
>>
>> Nonetheless, I think this is an improvement, because it doesn't require
>> restarting the whole unit.
>>
>> CC: Aaron Conole <acon...@redhat.com>
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> ---
>> v1->v2: No change, first non-RFC post.
>> ---
>
>Looks good - just one minor detail below
>
>>  lib/dpdk-stub.c |  8 
>>  lib/dpdk.c  | 30 --
>>  tests/ofproto-macros.at |  2 +-
>>  3 files changed, 25 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
>> index bd981bb90..daef7291f 100644
>> --- a/lib/dpdk-stub.c
>> +++ b/lib/dpdk-stub.c
>> @@ -27,13 +27,13 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
>>  void
>>  dpdk_init(const struct smap *ovs_other_config)
>>  {
>> -static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>  
>> -if (ovsthread_once_start()) {
>> -if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> +if (ovsthread_once_start()) {
>>  VLOG_ERR("DPDK not supported in this copy of Open vSwitch.");
>> +ovsthread_once_done();
>>  }
>> -ovsthread_once_done();
>>  }
>>  }
>>  
>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>> index ee4360b22..008c6c06d 100644
>> --- a/lib/dpdk.c
>> +++ b/lib/dpdk.c
>> @@ -273,12 +273,6 @@ dpdk_init__(const struct smap *ovs_other_config)
>>  cpu_set_t cpuset;
>>  char *sock_dir_subcomponent;
>>  
>> -if (!smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> -VLOG_INFO("DPDK Disabled - to change this requires a restart.\n");
>> -return;
>> -}
>> -
>> -VLOG_INFO("DPDK Enabled, initializing");
>>  if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
>>  NAME_MAX, ovs_other_config,
>>  _dir_subcomponent)) {
>> @@ -413,11 +407,27 @@ dpdk_init__(const struct smap *ovs_other_config)
>>  void
>>  dpdk_init(const struct smap *ovs_other_config)
>>  {
>> -static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +static bool enabled = false;
>
>This doesn't appear to be used, apart from the first test of the
>following conditional (where it will always pass to the second).  Did I
>miss something?

Oops, it should be used.

I need to set it to true after calling dpdk_init__(), otherwise the following 
scenario
might happen:

1) other_config:dpdk-init is set to "true"
2) vswitchd is started and dpdk_init__() is called
3) other_config:dpdk-init is set to "false"
4) The log message "DPDK Disabled - Use other_config:dpdk-init to enable" is 
printed, giving the illusion that DPDK was disabled.

I'll add 'enable=true' in the next version.

Thanks,

Daniele

>
>> +if (enabled || !ovs_other_config) {
>> +return;
>> +}
>> +
>> +if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
>> +static struct ovsthread_once once_enable = 
>> OVSTHREAD_ONCE_INITIALIZER;
>>  
>> -if (ovs_other_config && ovsthread_once_start()) {
>> -dpdk_init__(ovs_other_config);
>> -ovsthread_once_done();
>> +if (ovsthread_once_start(_enable)) {
>> +VLOG_INFO("DPDK Enabled - initializing...");
>> +dpdk_init__(ovs_other_config);
>> +VLOG_INFO("DPDK Enabled - initialized");
>> +ovsthread_once_done(_enable);
>> +}
>> +} else {
>> +static struct ovsthread_once once_disable = 
>> OVSTHREAD_ONCE_INITIALIZER;
>> +if (ovsthread_once_start(_disable)) {
>> + 

Re: [ovs-dev] [PATCH 4/4] datapath: Uses the OVS_CORE_UNSPEC instead of magic numbers.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> This patch uses OVS_CORE_UNSPEC for the queue unpinned instead
> of "-1". More important, the "-1" casted to unsigned int is
> equal to NON_PMD_CORE_ID. We make the distinction between them.
>
> Signed-off-by: nickcooper-zhangtonghao 

Thanks, this bothered me as well.  In fact I sent a patch for it in
the past as part of a series:

https://mail.openvswitch.org/pipermail/ovs-dev/2016-December/325692.html.

This shouldn't fix any problems, because I think we only compared
core_id with pmd threads (not why the non-pmd), but I agree that using
-1 for an unsigned is not pretty.

I fixed the title and applied this to master, thanks

> ---
>  lib/dpif-netdev.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 0b73056..99e4d35 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1293,7 +1293,7 @@ port_create(const char *devname, const char *type,
>   devname, ovs_strerror(errno));
>  goto out_rxq_close;
>  }
> -port->rxqs[i].core_id = -1;
> +port->rxqs[i].core_id = OVS_CORE_UNSPEC;
>  n_open_rxqs++;
>  }
>
> @@ -1517,7 +1517,7 @@ has_pmd_rxq_for_numa(struct dp_netdev *dp, int numa_id)
>  for (i = 0; i < port->n_rxq; i++) {
>  unsigned core_id = port->rxqs[i].core_id;
>
> -if (core_id != -1
> +if (core_id != OVS_CORE_UNSPEC
>  && ovs_numa_get_numa_id(core_id) == numa_id) {
>  return true;
>  }
> @@ -2704,7 +2704,7 @@ parse_affinity_list(const char *affinity_list, unsigned 
> *core_ids, int n_rxq)
>  int error = 0;
>
>  for (i = 0; i < n_rxq; i++) {
> -core_ids[i] = -1;
> +core_ids[i] = OVS_CORE_UNSPEC;
>  }
>
>  if (!affinity_list) {
> @@ -3617,7 +3617,7 @@ dp_netdev_add_port_rx_to_pmds(struct dp_netdev *dp,
>
>  for (i = 0; i < port->n_rxq; i++) {
>  if (pinned) {
> -if (port->rxqs[i].core_id == -1) {
> +if (port->rxqs[i].core_id == OVS_CORE_UNSPEC) {
>  continue;
>  }
>  pmd = dp_netdev_get_pmd(dp, port->rxqs[i].core_id);
> @@ -3631,7 +3631,7 @@ dp_netdev_add_port_rx_to_pmds(struct dp_netdev *dp,
>  pmd->isolated = true;
>  dp_netdev_pmd_unref(pmd);
>  } else {
> -if (port->rxqs[i].core_id != -1) {
> +if (port->rxqs[i].core_id != OVS_CORE_UNSPEC) {
>  continue;
>  }
>  pmd = dp_netdev_less_loaded_pmd_on_numa(dp, numa_id);
> @@ -3760,7 +3760,7 @@ dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
>  for (i = 0; i < port->n_rxq; i++) {
>  unsigned core_id = port->rxqs[i].core_id;
>
> -if (core_id != -1) {
> +if (core_id != OVS_CORE_UNSPEC) {
>  numa_id = ovs_numa_get_numa_id(core_id);
>  hmapx_add(, (void *) numa_id);
>  }
> --
> 1.8.3.1
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/4] datapath: Uses the NR_QUEUE instead of magic numbers.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> The NR_QUEUE is defined in "lib/dpif-netdev.h", netdev-dpdk
> uses it instead of magic number. netdev-dummy should be
> in the same case.
>
> Signed-off-by: nickcooper-zhangtonghao 

Thanks, I changed the prefix of the commit message and applied to master

> ---
>  lib/netdev-dummy.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index d75e597..8d9c805 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -868,8 +868,8 @@ netdev_dummy_set_config(struct netdev *netdev_, const 
> struct smap *args)
>  goto exit;
>  }
>
> -new_n_rxq = MAX(smap_get_int(args, "n_rxq", 1), 1);
> -new_n_txq = MAX(smap_get_int(args, "n_txq", 1), 1);
> +new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> +new_n_txq = MAX(smap_get_int(args, "n_txq", NR_QUEUE), 1);
>  new_numa_id = smap_get_int(args, "numa_id", 0);
>  if (new_n_rxq != netdev->requested_n_rxq
>  || new_n_txq != netdev->requested_n_txq
> --
> 1.8.3.1
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/4] datapath: Limits the number of tx/rx queues for netdev-dummy.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> This patch avoids the ovs_rcu to report WARN, caused by blocked
> for a long time, when ovs-vswitchd processes a port with many
> rx/tx queues. The number of tx/rx queues per port may be appropriate,
> because the dpdk uses it as an default max value.
>
> Signed-off-by: nickcooper-zhangtonghao 

I don't think this is a big deal, since netdev-dummy is only used for
testing, but don't you think it's better to check it in set_config()
and return an error?

Also, could you use the prefix netdev-dummy, instead of datapath?

Thanks,

Daniele

> ---
>  lib/netdev-dummy.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
> index d406cbc..d75e597 100644
> --- a/lib/netdev-dummy.c
> +++ b/lib/netdev-dummy.c
> @@ -897,6 +897,9 @@ netdev_dummy_get_numa_id(const struct netdev *netdev_)
>  return numa_id;
>  }
>
> +
> +#define DUMMY_MAX_QUEUES_PER_PORT 1024
> +
>  /* Sets the number of tx queues and rx queues for the dummy PMD interface. */
>  static int
>  netdev_dummy_reconfigure(struct netdev *netdev_)
> @@ -905,8 +908,8 @@ netdev_dummy_reconfigure(struct netdev *netdev_)
>
>  ovs_mutex_lock(>mutex);
>
> -netdev_->n_txq = netdev->requested_n_txq;
> -netdev_->n_rxq = netdev->requested_n_rxq;
> +netdev_->n_txq = MIN(DUMMY_MAX_QUEUES_PER_PORT, netdev->requested_n_txq);
> +netdev_->n_rxq = MIN(DUMMY_MAX_QUEUES_PER_PORT, netdev->requested_n_rxq);
>  netdev->numa_id = netdev->requested_numa_id;
>
>  ovs_mutex_unlock(>mutex);
> --
> 1.8.3.1
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] dpdk: Late initialization.

2017-01-08 Thread Daniele Di Proietto
With this commit, we allow the user to set other_config:dpdk-init=true
after the process is started.  This makes it easier to start Open
vSwitch with DPDK using standard init scripts without restarting the
service.

This is still far from ideal, because initializing DPDK might still
abort the process (e.g. if there not enough memory), so the user must
check the status of the process after setting dpdk-init to true.

Nonetheless, I think this is an improvement, because it doesn't require
restarting the whole unit.

CC: Aaron Conole <acon...@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
v1->v2: No change, first non-RFC post.
---
 lib/dpdk-stub.c |  8 
 lib/dpdk.c  | 30 --
 tests/ofproto-macros.at |  2 +-
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index bd981bb90..daef7291f 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -27,13 +27,13 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovsthread_once_start()) {
-if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+if (ovsthread_once_start()) {
 VLOG_ERR("DPDK not supported in this copy of Open vSwitch.");
+ovsthread_once_done();
 }
-ovsthread_once_done();
 }
 }
 
diff --git a/lib/dpdk.c b/lib/dpdk.c
index ee4360b22..008c6c06d 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -273,12 +273,6 @@ dpdk_init__(const struct smap *ovs_other_config)
 cpu_set_t cpuset;
 char *sock_dir_subcomponent;
 
-if (!smap_get_bool(ovs_other_config, "dpdk-init", false)) {
-VLOG_INFO("DPDK Disabled - to change this requires a restart.\n");
-return;
-}
-
-VLOG_INFO("DPDK Enabled, initializing");
 if (process_vhost_flags("vhost-sock-dir", xstrdup(ovs_rundir()),
 NAME_MAX, ovs_other_config,
 _dir_subcomponent)) {
@@ -413,11 +407,27 @@ dpdk_init__(const struct smap *ovs_other_config)
 void
 dpdk_init(const struct smap *ovs_other_config)
 {
-static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
+static bool enabled = false;
+
+if (enabled || !ovs_other_config) {
+return;
+}
+
+if (smap_get_bool(ovs_other_config, "dpdk-init", false)) {
+static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER;
 
-if (ovs_other_config && ovsthread_once_start()) {
-dpdk_init__(ovs_other_config);
-ovsthread_once_done();
+if (ovsthread_once_start(_enable)) {
+VLOG_INFO("DPDK Enabled - initializing...");
+dpdk_init__(ovs_other_config);
+VLOG_INFO("DPDK Enabled - initialized");
+ovsthread_once_done(_enable);
+}
+} else {
+static struct ovsthread_once once_disable = OVSTHREAD_ONCE_INITIALIZER;
+if (ovsthread_once_start(_disable)) {
+VLOG_INFO("DPDK Disabled - Use other_config:dpdk-init to enable");
+ovsthread_once_done(_disable);
+}
 }
 }
 
diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at
index 547b8..faff5b0a8 100644
--- a/tests/ofproto-macros.at
+++ b/tests/ofproto-macros.at
@@ -331,7 +331,7 @@ m4_define([_OVS_VSWITCHD_START],
 /ofproto|INFO|using datapath ID/d
 /netdev_linux|INFO|.*device has unknown hardware address family/d
 /ofproto|INFO|datapath ID changed to fedcba9876543210/d
-/dpdk|INFO|DPDK Disabled - to change this requires a restart./d']])
+/dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d']])
 ])
 
 # OVS_VSWITCHD_START([vsctl-args], [vsctl-output], [=override],
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/4] datapath: Fix formatting typo.

2017-01-08 Thread Daniele Di Proietto
2017-01-08 17:30 GMT-08:00 nickcooper-zhangtonghao :
> Signed-off-by: nickcooper-zhangtonghao 

Thanks, I changed the prefix to netdev-dpdk (instead of datapath) and
pushed this to master

> ---
>  lib/netdev-dpdk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 625f425..376aa4d 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -738,7 +738,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
>
>  memset(_addr, 0x0, sizeof(eth_addr));
>  rte_eth_macaddr_get(dev->port_id, _addr);
> -VLOG_INFO_RL(, "Port %d: "ETH_ADDR_FMT"",
> +VLOG_INFO_RL(, "Port %d: "ETH_ADDR_FMT,
>  dev->port_id, ETH_ADDR_BYTES_ARGS(eth_addr.addr_bytes));
>
>  memcpy(dev->hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN);
> --
> 1.8.3.1
>
>
>
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 16/18] dpif-netdev: Use hmap for poll_list in pmd threads.

2017-01-08 Thread Daniele Di Proietto
A future commit will use this to determine if a queue is already
contained in a pmd thread.

To keep the behavior unaltered we now have to sort queues before
printing them in pmd_info_show_rxq().

Also this commit introduces 'struct polled_queue' that will be used
exclusively in the fast path, uses 'struct dp_netdev_rxq' from 'struct
rxq_poll' and uses 'rx' for 'netdev_rxq' and 'rxq' for 'dp_netdev_rxq'.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 168 --
 1 file changed, 112 insertions(+), 56 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f170f5c96..d996e3c9a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -280,8 +280,12 @@ enum pmd_cycles_counter_type {
 
 /* Contained by struct dp_netdev_port's 'rxqs' member.  */
 struct dp_netdev_rxq {
-struct netdev_rxq *rxq;
-unsigned core_id;   /* Сore to which this queue is pinned. */
+struct dp_netdev_port *port;
+struct netdev_rxq *rx;
+unsigned core_id;  /* Core to which this queue should be
+  pinned. OVS_CORE_UNSPEC if the
+  queue doesn't need to be pinned to a
+  particular core. */
 };
 
 /* A port in a netdev-based datapath. */
@@ -415,11 +419,15 @@ struct dp_netdev_pmd_cycles {
 atomic_ullong n[PMD_N_CYCLES];
 };
 
+struct polled_queue {
+struct netdev_rxq *rx;
+odp_port_t port_no;
+};
+
 /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
 struct rxq_poll {
-struct dp_netdev_port *port;
-struct netdev_rxq *rx;
-struct ovs_list node;
+struct dp_netdev_rxq *rxq;
+struct hmap_node node;
 };
 
 /* Contained by struct dp_netdev_pmd_thread's 'send_port_cache',
@@ -500,9 +508,7 @@ struct dp_netdev_pmd_thread {
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
-struct ovs_list poll_list OVS_GUARDED;
-/* Number of elements in 'poll_list' */
-int poll_cnt;
+struct hmap poll_list OVS_GUARDED;
 /* Map of 'tx_port's used for transmission.  Written by the main thread,
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
@@ -586,8 +592,8 @@ static void dp_netdev_add_port_to_pmds(struct dp_netdev *dp,
 static void dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd,
  struct dp_netdev_port *port);
 static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
- struct dp_netdev_port *port,
- struct netdev_rxq *rx);
+ struct dp_netdev_rxq *rxq)
+OVS_REQUIRES(pmd->port_mutex);
 static struct dp_netdev_pmd_thread *
 dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
@@ -783,12 +789,56 @@ pmd_info_clear_stats(struct ds *reply OVS_UNUSED,
 }
 }
 
+static int
+compare_poll_list(const void *a_, const void *b_)
+{
+const struct rxq_poll *a = a_;
+const struct rxq_poll *b = b_;
+
+const char *namea = netdev_rxq_get_name(a->rxq->rx);
+const char *nameb = netdev_rxq_get_name(b->rxq->rx);
+
+int cmp = strcmp(namea, nameb);
+if (!cmp) {
+return netdev_rxq_get_queue_id(a->rxq->rx)
+   - netdev_rxq_get_queue_id(b->rxq->rx);
+} else {
+return cmp;
+}
+}
+
+static void
+sorted_poll_list(struct dp_netdev_pmd_thread *pmd, struct rxq_poll **list,
+ size_t *n)
+{
+struct rxq_poll *ret, *poll;
+size_t i;
+
+*n = hmap_count(>poll_list);
+if (!*n) {
+ret = NULL;
+} else {
+ret = xcalloc(*n, sizeof *ret);
+i = 0;
+HMAP_FOR_EACH (poll, node, >poll_list) {
+ret[i] = *poll;
+i++;
+}
+ovs_assert(i == *n);
+}
+
+qsort(ret, *n, sizeof *ret, compare_poll_list);
+
+*list = ret;
+}
+
 static void
 pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
 {
 if (pmd->core_id != NON_PMD_CORE_ID) {
-struct rxq_poll *poll;
 const char *prev_name = NULL;
+struct rxq_poll *list;
+size_t i, n;
 
 ds_put_format(reply,
   "pmd thread numa_id %d core_id %u:\n\tisolated : %s\n",
@@ -796,21 +846,23 @@ pmd_info_show_rxq(struct ds *reply, struct 
dp_netdev_pmd_thread *pmd)
   ? "true" : "false");
 
 ovs_mutex_lock(>port_mutex);
-LIST_FOR_EACH (poll, node, >poll_list) {
-const char *name = netdev_get_name(poll->port->netdev);
+sorted_poll_list(pmd, , );
+for (i = 0; i < n; i++) {
+   

[ovs-dev] [PATCH v3 18/18] ovs-numa: Remove unused functions.

2017-01-08 Thread Daniele Di Proietto
ovs-numa doesn't need to keep the state of the pmd threads, it is an
implementation detail of dpif-netdev.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ovs-numa.c | 175 -
 lib/ovs-numa.h |   7 ---
 2 files changed, 182 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index 04225a958..2e038b745 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -70,8 +70,6 @@ struct cpu_core {
 struct ovs_list list_node; /* In 'numa_node->cores' list. */
 struct numa_node *numa;/* numa node containing the core. */
 unsigned core_id;  /* Core id. */
-bool available;/* If the core can be pinned. */
-bool pinned;   /* If a thread has been pinned to the core. */
 };
 
 /* Contains all 'struct numa_node's. */
@@ -119,7 +117,6 @@ insert_new_cpu_core(struct numa_node *n, unsigned core_id)
 ovs_list_insert(>cores, >list_node);
 c->core_id = core_id;
 c->numa = n;
-c->available = true;
 
 return c;
 }
@@ -342,18 +339,6 @@ ovs_numa_core_id_is_valid(unsigned core_id)
 return found_numa_and_core && core_id < ovs_numa_get_n_cores();
 }
 
-bool
-ovs_numa_core_is_pinned(unsigned core_id)
-{
-struct cpu_core *core = get_core_by_core_id(core_id);
-
-if (core) {
-return core->pinned;
-}
-
-return false;
-}
-
 /* Returns the number of numa nodes. */
 int
 ovs_numa_get_n_numas(void)
@@ -398,97 +383,6 @@ ovs_numa_get_n_cores_on_numa(int numa_id)
 return OVS_CORE_UNSPEC;
 }
 
-/* Returns the number of cpu cores that are available and unpinned
- * on numa node.  Returns OVS_CORE_UNSPEC if 'numa_id' is invalid. */
-int
-ovs_numa_get_n_unpinned_cores_on_numa(int numa_id)
-{
-struct numa_node *numa = get_numa_by_numa_id(numa_id);
-
-if (numa) {
-struct cpu_core *core;
-int count = 0;
-
-LIST_FOR_EACH(core, list_node, >cores) {
-if (core->available && !core->pinned) {
-count++;
-}
-}
-return count;
-}
-
-return OVS_CORE_UNSPEC;
-}
-
-/* Given 'core_id', tries to pin that core.  Returns true, if succeeds.
- * False, if the core has already been pinned, or if it is invalid or
- * not available. */
-bool
-ovs_numa_try_pin_core_specific(unsigned core_id)
-{
-struct cpu_core *core = get_core_by_core_id(core_id);
-
-if (core) {
-if (core->available && !core->pinned) {
-core->pinned = true;
-return true;
-}
-}
-
-return false;
-}
-
-/* Searches through all cores for an unpinned and available core.  Returns
- * the 'core_id' if found and sets the 'core->pinned' to true.  Otherwise,
- * returns OVS_CORE_UNSPEC. */
-unsigned
-ovs_numa_get_unpinned_core_any(void)
-{
-struct cpu_core *core;
-
-HMAP_FOR_EACH(core, hmap_node, _cpu_cores) {
-if (core->available && !core->pinned) {
-core->pinned = true;
-return core->core_id;
-}
-}
-
-return OVS_CORE_UNSPEC;
-}
-
-/* Searches through all cores on numa node with 'numa_id' for an
- * unpinned and available core.  Returns the core_id if found and
- * sets the 'core->pinned' to true.  Otherwise, returns OVS_CORE_UNSPEC. */
-unsigned
-ovs_numa_get_unpinned_core_on_numa(int numa_id)
-{
-struct numa_node *numa = get_numa_by_numa_id(numa_id);
-
-if (numa) {
-struct cpu_core *core;
-
-LIST_FOR_EACH(core, list_node, >cores) {
-if (core->available && !core->pinned) {
-core->pinned = true;
-return core->core_id;
-}
-}
-}
-
-return OVS_CORE_UNSPEC;
-}
-
-/* Unpins the core with 'core_id'. */
-void
-ovs_numa_unpin_core(unsigned core_id)
-{
-struct cpu_core *core = get_core_by_core_id(core_id);
-
-if (core) {
-core->pinned = false;
-}
-}
-
 static struct ovs_numa_dump *
 ovs_numa_dump_create(void)
 {
@@ -654,75 +548,6 @@ ovs_numa_dump_destroy(struct ovs_numa_dump *dump)
 free(dump);
 }
 
-/* Reads the cpu mask configuration from 'cmask' and sets the
- * 'available' of corresponding cores.  For unspecified cores,
- * sets 'available' to false. */
-void
-ovs_numa_set_cpu_mask(const char *cmask)
-{
-int core_id = 0;
-int i;
-int end_idx;
-
-if (!found_numa_and_core) {
-return;
-}
-
-/* If no mask specified, resets the 'available' to true for all cores. */
-if (!cmask) {
-struct cpu_core *core;
-
-HMAP_FOR_EACH(core, hmap_node, _cpu_cores) {
-core->available = true;
-}
-
-return;
-}
-
-/* Ignore leading 0x. */
-end_idx = 0;
-if (!strncmp(cmask, "0x", 2) || !strncmp(cmask, "0X", 2)) {
-end_idx = 2;
-}
-
-for (i = strlen(cmask) - 1; i >= end_idx; i-

[ovs-dev] [PATCH v3 11/18] dpctl: Avoid making assumptions on pmd threads.

2017-01-08 Thread Daniele Di Proietto
Currently dpctl depends on ovs-numa module to delete and create flows on
different pmd threads for pmd devices.

The next commits will move away the pmd threads state from ovs-numa to
dpif-netdev, so the ovs-numa interface will not be supported.

Also, the assignment between ports and thread is an implementation
detail of dpif-netdev, dpctl shouldn't know anything about it.

This commit changes the dpif_flow_put() and dpif_flow_del() calls to
iterate over all the pmd threads, if pmd_id is PMD_ID_NULL.

A simple test is added.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpctl.c   | 107 
 lib/dpif-netdev.c | 180 +-
 lib/dpif.c|   6 +-
 lib/dpif.h|  12 +++-
 tests/pmd.at  |  44 +
 5 files changed, 194 insertions(+), 155 deletions(-)

diff --git a/lib/dpctl.c b/lib/dpctl.c
index 7011b183d..23837ce74 100644
--- a/lib/dpctl.c
+++ b/lib/dpctl.c
@@ -40,7 +40,6 @@
 #include "netlink.h"
 #include "odp-util.h"
 #include "openvswitch/ofpbuf.h"
-#include "ovs-numa.h"
 #include "packets.h"
 #include "openvswitch/shash.h"
 #include "simap.h"
@@ -876,43 +875,12 @@ out_freefilter:
 return error;
 }
 
-/* Extracts the in_port from the parsed keys, and returns the reference
- * to the 'struct netdev *' of the dpif port.  On error, returns NULL.
- * Users must call 'netdev_close()' after finish using the returned
- * reference. */
-static struct netdev *
-get_in_port_netdev_from_key(struct dpif *dpif, const struct ofpbuf *key)
-{
-const struct nlattr *in_port_nla;
-struct netdev *dev = NULL;
-
-in_port_nla = nl_attr_find(key, 0, OVS_KEY_ATTR_IN_PORT);
-if (in_port_nla) {
-struct dpif_port dpif_port;
-odp_port_t port_no;
-int error;
-
-port_no = ODP_PORT_C(nl_attr_get_u32(in_port_nla));
-error = dpif_port_query_by_number(dpif, port_no, _port);
-if (error) {
-goto out;
-}
-
-netdev_open(dpif_port.name, dpif_port.type, );
-dpif_port_destroy(_port);
-}
-
-out:
-return dev;
-}
-
 static int
 dpctl_put_flow(int argc, const char *argv[], enum dpif_flow_put_flags flags,
struct dpctl_params *dpctl_p)
 {
 const char *key_s = argv[argc - 2];
 const char *actions_s = argv[argc - 1];
-struct netdev *in_port_netdev = NULL;
 struct dpif_flow_stats stats;
 struct dpif_port dpif_port;
 struct dpif_port_dump port_dump;
@@ -968,39 +936,15 @@ dpctl_put_flow(int argc, const char *argv[], enum 
dpif_flow_put_flags flags,
 goto out_freeactions;
 }
 
-/* For DPDK interface, applies the operation to all pmd threads
- * on the same numa node. */
-in_port_netdev = get_in_port_netdev_from_key(dpif, );
-if (in_port_netdev && netdev_is_pmd(in_port_netdev)) {
-int numa_id;
-
-numa_id = netdev_get_numa_id(in_port_netdev);
-if (ovs_numa_numa_id_is_valid(numa_id)) {
-struct ovs_numa_dump *dump = ovs_numa_dump_cores_on_numa(numa_id);
-struct ovs_numa_info *iter;
-
-FOR_EACH_CORE_ON_NUMA (iter, dump) {
-if (ovs_numa_core_is_pinned(iter->core_id)) {
-error = dpif_flow_put(dpif, flags,
-  key.data, key.size,
-  mask.size == 0 ? NULL : mask.data,
-  mask.size, actions.data,
-  actions.size, ufid_present ?  : 
NULL,
-  iter->core_id, 
dpctl_p->print_statistics ?  : NULL);
-}
-}
-ovs_numa_dump_destroy(dump);
-} else {
-error = EINVAL;
-}
-} else {
-error = dpif_flow_put(dpif, flags,
-  key.data, key.size,
-  mask.size == 0 ? NULL : mask.data,
-  mask.size, actions.data,
-  actions.size, ufid_present ?  : NULL,
-  PMD_ID_NULL, dpctl_p->print_statistics ?  
: NULL);
-}
+/* The flow will be added on all pmds currently in the datapath. */
+error = dpif_flow_put(dpif, flags,
+  key.data, key.size,
+  mask.size == 0 ? NULL : mask.data,
+  mask.size, actions.data,
+  actions.size, ufid_present ?  : NULL,
+  PMD_ID_NULL,
+  dpctl_p->print_statistics ?  : NULL);
+
 if (error) {
 dpctl_error(dpctl_p, error, "updating flow table");
 goto out_freeactions;
@@ -1021,7 +965,6 @@ out_freekeymask:
 ofpbuf_uninit();
 ofpbuf_uninit();
 dpif_close(dpif);
-ne

[ovs-dev] [PATCH v3 15/18] ovs-numa: Add per numa and global counts in dump.

2017-01-08 Thread Daniele Di Proietto
They will be used by a future commit.

Suggested-by: Ilya Maximets <i.maxim...@samsung.com>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ovs-numa.c | 96 +-
 lib/ovs-numa.h | 18 +--
 2 files changed, 77 insertions(+), 37 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index f8a37b1ea..04225a958 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -489,25 +489,53 @@ ovs_numa_unpin_core(unsigned core_id)
 }
 }
 
+static struct ovs_numa_dump *
+ovs_numa_dump_create(void)
+{
+struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+
+hmap_init(>cores);
+hmap_init(>numas);
+
+return dump;
+}
+
+static void
+ovs_numa_dump_add(struct ovs_numa_dump *dump, int numa_id, int core_id)
+{
+struct ovs_numa_info_core *c = xzalloc(sizeof *c);
+struct ovs_numa_info_numa *n;
+
+c->numa_id = numa_id;
+c->core_id = core_id;
+hmap_insert(>cores, >hmap_node, hash_2words(numa_id, core_id));
+
+HMAP_FOR_EACH_WITH_HASH (n, hmap_node, hash_int(numa_id, 0),
+ >numas) {
+if (n->numa_id == numa_id) {
+n->n_cores++;
+return;
+}
+}
+
+n = xzalloc(sizeof *n);
+n->numa_id = numa_id;
+n->n_cores = 1;
+hmap_insert(>numas, >hmap_node, hash_int(numa_id, 0));
+}
+
 /* Given the 'numa_id', returns dump of all cores on the numa node. */
 struct ovs_numa_dump *
 ovs_numa_dump_cores_on_numa(int numa_id)
 {
-struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+struct ovs_numa_dump *dump = ovs_numa_dump_create();
 struct numa_node *numa = get_numa_by_numa_id(numa_id);
 
-hmap_init(>dump);
-
 if (numa) {
 struct cpu_core *core;
 
-LIST_FOR_EACH(core, list_node, >cores) {
-struct ovs_numa_info *info = xmalloc(sizeof *info);
-
-info->numa_id = numa->numa_id;
-info->core_id = core->core_id;
-hmap_insert(>dump, >hmap_node,
-hash_2words(info->numa_id, info->core_id));
+LIST_FOR_EACH (core, list_node, >cores) {
+ovs_numa_dump_add(dump, numa->numa_id, core->core_id);
 }
 }
 
@@ -517,12 +545,10 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 struct ovs_numa_dump *
 ovs_numa_dump_cores_with_cmask(const char *cmask)
 {
-struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+struct ovs_numa_dump *dump = ovs_numa_dump_create();
 int core_id = 0;
 int end_idx;
 
-hmap_init(>dump);
-
 /* Ignore leading 0x. */
 end_idx = 0;
 if (!strncmp(cmask, "0x", 2) || !strncmp(cmask, "0X", 2)) {
@@ -547,12 +573,9 @@ ovs_numa_dump_cores_with_cmask(const char *cmask)
 struct cpu_core *core = get_core_by_core_id(core_id);
 
 if (core) {
-struct ovs_numa_info *info = xmalloc(sizeof *info);
-
-info->numa_id = core->numa->numa_id;
-info->core_id = core->core_id;
-hmap_insert(>dump, >hmap_node,
-hash_2words(info->numa_id, info->core_id));
+ovs_numa_dump_add(dump,
+  core->numa->numa_id,
+  core->core_id);
 }
 }
 
@@ -566,11 +589,9 @@ ovs_numa_dump_cores_with_cmask(const char *cmask)
 struct ovs_numa_dump *
 ovs_numa_dump_n_cores_per_numa(int cores_per_numa)
 {
-struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+struct ovs_numa_dump *dump = ovs_numa_dump_create();
 const struct numa_node *n;
 
-hmap_init(>dump);
-
 HMAP_FOR_EACH (n, hmap_node, _numa_nodes) {
 const struct cpu_core *core;
 int i = 0;
@@ -580,12 +601,7 @@ ovs_numa_dump_n_cores_per_numa(int cores_per_numa)
 break;
 }
 
-struct ovs_numa_info *info = xmalloc(sizeof *info);
-
-info->numa_id = core->numa->numa_id;
-info->core_id = core->core_id;
-hmap_insert(>dump, >hmap_node,
-hash_2words(info->numa_id, info->core_id));
+ovs_numa_dump_add(dump, core->numa->numa_id, core->core_id);
 }
 }
 
@@ -596,10 +612,10 @@ bool
 ovs_numa_dump_contains_core(const struct ovs_numa_dump *dump,
 int numa_id, unsigned core_id)
 {
-struct ovs_numa_info *core;
+struct ovs_numa_info_core *core;
 
 HMAP_FOR_EACH_WITH_HASH (core, hmap_node, hash_2words(numa_id, core_id),
- >dump) {
+ >cores) {
 if (core->core_id == core_id && core->numa_id == numa_id) {
 return true;
 }
@@ -608,20 +624,32 @@ ovs_numa_dump_con

[ovs-dev] [PATCH v3 14/18] ovs-numa: Don't use hmap_first_with_hash().

2017-01-08 Thread Daniele Di Proietto
I think it's better to iterate the hmap than to use
hmap_first_with_hash(), because it handles hash collisions.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ovs-numa.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index 61c31cf69..f8a37b1ea 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -241,30 +241,32 @@ discover_numa_and_core(void)
 static struct cpu_core*
 get_core_by_core_id(unsigned core_id)
 {
-struct cpu_core *core = NULL;
+struct cpu_core *core;
 
-if (ovs_numa_core_id_is_valid(core_id)) {
-core = CONTAINER_OF(hmap_first_with_hash(_cpu_cores,
- hash_int(core_id, 0)),
-struct cpu_core, hmap_node);
+HMAP_FOR_EACH_WITH_HASH (core, hmap_node, hash_int(core_id, 0),
+ _cpu_cores) {
+if (core->core_id == core_id) {
+return core;
+}
 }
 
-return core;
+return NULL;
 }
 
 /* Gets 'struct numa_node' by 'numa_id'. */
 static struct numa_node*
 get_numa_by_numa_id(int numa_id)
 {
-struct numa_node *numa = NULL;
+struct numa_node *numa;
 
-if (ovs_numa_numa_id_is_valid(numa_id)) {
-numa = CONTAINER_OF(hmap_first_with_hash(_numa_nodes,
- hash_int(numa_id, 0)),
-struct numa_node, hmap_node);
+HMAP_FOR_EACH_WITH_HASH (numa, hmap_node, hash_int(numa_id, 0),
+ _numa_nodes) {
+if (numa->numa_id == numa_id) {
+return numa;
+}
 }
 
-return numa;
+return NULL;
 }
 
 
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 12/18] ovs-numa: New ovs_numa_dump_contains_core() function.

2017-01-08 Thread Daniele Di Proietto
It will be used by a future commit.  struct ovs_numa_dump now uses an
hmap instead of a list to make ovs_numa_dump_contains_core() more
efficient.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ovs-numa.c | 25 ++---
 lib/ovs-numa.h | 10 ++
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index e1e7068a2..85f392a91 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -494,7 +494,7 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
 struct numa_node *numa = get_numa_by_numa_id(numa_id);
 
-ovs_list_init(>dump);
+hmap_init(>dump);
 
 if (numa) {
 struct cpu_core *core;
@@ -504,13 +504,30 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 
 info->numa_id = numa->numa_id;
 info->core_id = core->core_id;
-ovs_list_insert(>dump, >list_node);
+hmap_insert(>dump, >hmap_node,
+hash_2words(info->numa_id, info->core_id));
 }
 }
 
 return dump;
 }
 
+bool
+ovs_numa_dump_contains_core(const struct ovs_numa_dump *dump,
+int numa_id, unsigned core_id)
+{
+struct ovs_numa_info *core;
+
+HMAP_FOR_EACH_WITH_HASH (core, hmap_node, hash_2words(numa_id, core_id),
+ >dump) {
+if (core->core_id == core_id && core->numa_id == numa_id) {
+return true;
+}
+}
+
+return false;
+}
+
 void
 ovs_numa_dump_destroy(struct ovs_numa_dump *dump)
 {
@@ -520,10 +537,12 @@ ovs_numa_dump_destroy(struct ovs_numa_dump *dump)
 return;
 }
 
-LIST_FOR_EACH_POP (iter, list_node, >dump) {
+HMAP_FOR_EACH_POP (iter, hmap_node, >dump) {
 free(iter);
 }
 
+hmap_destroy(>dump);
+
 free(dump);
 }
 
diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h
index be836b2ca..c0eae07d8 100644
--- a/lib/ovs-numa.h
+++ b/lib/ovs-numa.h
@@ -21,19 +21,19 @@
 #include 
 
 #include "compiler.h"
-#include "openvswitch/list.h"
+#include "openvswitch/hmap.h"
 
 #define OVS_CORE_UNSPEC INT_MAX
 #define OVS_NUMA_UNSPEC INT_MAX
 
 /* Dump of a list of 'struct ovs_numa_info'. */
 struct ovs_numa_dump {
-struct ovs_list dump;
+struct hmap dump;
 };
 
 /* A numa_id - core_id pair. */
 struct ovs_numa_info {
-struct ovs_list list_node;
+struct hmap_node hmap_node;
 int numa_id;
 unsigned core_id;
 };
@@ -54,10 +54,12 @@ unsigned ovs_numa_get_unpinned_core_any(void);
 unsigned ovs_numa_get_unpinned_core_on_numa(int numa_id);
 void ovs_numa_unpin_core(unsigned core_id);
 struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);
+bool ovs_numa_dump_contains_core(const struct ovs_numa_dump *,
+ int numa_id, unsigned core_id);
 void ovs_numa_dump_destroy(struct ovs_numa_dump *);
 int ovs_numa_thread_setaffinity_core(unsigned core_id);
 
 #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)\
-LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)
+HMAP_FOR_EACH((ITER), hmap_node, &(DUMP)->dump)
 
 #endif /* ovs-numa.h */
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 10/18] dpif-netdev: Make 'static_tx_qid' const.

2017-01-08 Thread Daniele Di Proietto
Since previous commit, 'static_tx_qid' doesn't need to be atomic and is
actually never touched (except for initialization), so it can be made
const.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 432bac814..436f945b7 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -496,7 +496,7 @@ struct dp_netdev_pmd_thread {
 /* Queue id used by this pmd thread to send packets on all netdevs if
  * XPS disabled for this netdev. All static_tx_qid's are unique and less
  * than 'ovs_numa_get_n_cores() + 1'. */
-atomic_int static_tx_qid;
+const int static_tx_qid;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
@@ -3286,10 +3286,9 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread 
*pmd, struct dp_netdev *dp,
 pmd->numa_id = numa_id;
 pmd->poll_cnt = 0;
 
-atomic_init(>static_tx_qid,
-(core_id == NON_PMD_CORE_ID)
-? ovs_numa_get_n_cores()
-: get_n_pmd_threads(dp));
+*CONST_CAST(int *, >static_tx_qid) = (core_id == NON_PMD_CORE_ID)
+  ? ovs_numa_get_n_cores()
+  : get_n_pmd_threads(dp);
 
 ovs_refcount_init(>ref_cnt);
 latch_init(>exit_latch);
@@ -4394,7 +4393,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch 
*packets_,
 if (dynamic_txqs) {
 tx_qid = dpif_netdev_xps_get_tx_qid(pmd, p, now);
 } else {
-atomic_read_relaxed(>static_tx_qid, _qid);
+tx_qid = pmd->static_tx_qid;
 }
 
 netdev_send(p->port->netdev, tx_qid, packets_, may_steal,
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 09/18] dpif-netdev: Create pmd threads for every numa node.

2017-01-08 Thread Daniele Di Proietto
A lot of the complexity in the code that handles pmd threads and ports
in dpif-netdev is due to the fact that we postpone the creation of pmd
threads on a numa node until we have a port that needs to be polled on
that particular node.

Since the previous commit, a pmd thread with no ports will not consume
any CPU, so it seems easier to create all the threads at once.

This will also make future commits easier.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 208 ++
 tests/pmd.at  |   2 +-
 2 files changed, 69 insertions(+), 141 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index dc24e72dc..432bac814 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -575,8 +575,8 @@ static struct dp_netdev_pmd_thread 
*dp_netdev_get_pmd(struct dp_netdev *dp,
 static struct dp_netdev_pmd_thread *
 dp_netdev_pmd_get_next(struct dp_netdev *dp, struct cmap_position *pos);
 static void dp_netdev_destroy_all_pmds(struct dp_netdev *dp);
-static void dp_netdev_del_pmds_on_numa(struct dp_netdev *dp, int numa_id);
-static void dp_netdev_set_pmds_on_numa(struct dp_netdev *dp, int numa_id)
+static void dp_netdev_stop_pmds(struct dp_netdev *dp);
+static void dp_netdev_start_pmds(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex);
 static void dp_netdev_pmd_clear_ports(struct dp_netdev_pmd_thread *pmd);
 static void dp_netdev_del_port_from_all_pmds(struct dp_netdev *dp,
@@ -1092,19 +1092,20 @@ dp_netdev_free(struct dp_netdev *dp)
 
 shash_find_and_delete(_netdevs, dp->name);
 
-dp_netdev_destroy_all_pmds(dp);
-ovs_mutex_destroy(>non_pmd_mutex);
-ovsthread_key_delete(dp->per_pmd_key);
-
-conntrack_destroy(>conntrack);
-
 ovs_mutex_lock(>port_mutex);
 HMAP_FOR_EACH_SAFE (port, next, node, >ports) {
 do_del_port(dp, port);
 }
 ovs_mutex_unlock(>port_mutex);
+dp_netdev_destroy_all_pmds(dp);
 cmap_destroy(>poll_threads);
 
+ovs_mutex_destroy(>non_pmd_mutex);
+ovsthread_key_delete(dp->per_pmd_key);
+
+conntrack_destroy(>conntrack);
+
+
 seq_destroy(dp->reconfigure_seq);
 
 seq_destroy(dp->port_seq);
@@ -1348,10 +1349,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, 
const char *type,
 }
 
 if (netdev_is_pmd(port->netdev)) {
-int numa_id = netdev_get_numa_id(port->netdev);
-
-ovs_assert(ovs_numa_numa_id_is_valid(numa_id));
-dp_netdev_set_pmds_on_numa(dp, numa_id);
+dp_netdev_start_pmds(dp);
 }
 
 dp_netdev_add_port_to_pmds(dp, port);
@@ -1493,45 +1491,16 @@ get_n_pmd_threads(struct dp_netdev *dp)
 return cmap_count(>poll_threads) - 1;
 }
 
-static int
-get_n_pmd_threads_on_numa(struct dp_netdev *dp, int numa_id)
-{
-struct dp_netdev_pmd_thread *pmd;
-int n_pmds = 0;
-
-CMAP_FOR_EACH (pmd, node, >poll_threads) {
-if (pmd->numa_id == numa_id) {
-n_pmds++;
-}
-}
-
-return n_pmds;
-}
-
-/* Returns 'true' if there is a port with pmd netdev and the netdev is on
- * numa node 'numa_id' or its rx queue assigned to core on that numa node . */
+/* Returns 'true' if there is a port with pmd netdev. */
 static bool
-has_pmd_rxq_for_numa(struct dp_netdev *dp, int numa_id)
+has_pmd_port(struct dp_netdev *dp)
 OVS_REQUIRES(dp->port_mutex)
 {
 struct dp_netdev_port *port;
 
 HMAP_FOR_EACH (port, node, >ports) {
 if (netdev_is_pmd(port->netdev)) {
-int i;
-
-if (netdev_get_numa_id(port->netdev) == numa_id) {
-return true;
-}
-
-for (i = 0; i < port->n_rxq; i++) {
-unsigned core_id = port->rxqs[i].core_id;
-
-if (core_id != OVS_CORE_UNSPEC
-&& ovs_numa_get_numa_id(core_id) == numa_id) {
-return true;
-}
-}
+return true;
 }
 }
 
@@ -1549,14 +1518,9 @@ do_del_port(struct dp_netdev *dp, struct dp_netdev_port 
*port)
 dp_netdev_del_port_from_all_pmds(dp, port);
 
 if (netdev_is_pmd(port->netdev)) {
-int numa_id = netdev_get_numa_id(port->netdev);
-
-/* PMD threads can not be on invalid numa node. */
-ovs_assert(ovs_numa_numa_id_is_valid(numa_id));
-/* If there is no netdev on the numa node, deletes the pmd threads
- * for that numa. */
-if (!has_pmd_rxq_for_numa(dp, numa_id)) {
-dp_netdev_del_pmds_on_numa(dp, numa_id);
+/* If there is no pmd netdev, delete the pmd threads */
+if (!has_pmd_port(dp)) {
+dp_netdev_stop_pmds(dp);
 }
 }
 
@@ -3407,18 +3371,22 @@ dp_netdev_del_pmd(struct dp_netdev *dp, struct 
dp_netdev_pmd_thread *pmd)
 dp_netdev_pmd_unref(pmd);
 }
 
-/* Destroys all pmd threads. */
+/* Destroys all pmd threads, but not the non pmd thr

[ovs-dev] [PATCH v3 13/18] ovs-numa: Add new dump types.

2017-01-08 Thread Daniele Di Proietto
They will be used by a future commit.

This patch introduces some code duplication which will be removed in a
future commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/ovs-numa.c | 78 ++
 lib/ovs-numa.h |  4 ++-
 2 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c
index 85f392a91..61c31cf69 100644
--- a/lib/ovs-numa.c
+++ b/lib/ovs-numa.c
@@ -512,6 +512,84 @@ ovs_numa_dump_cores_on_numa(int numa_id)
 return dump;
 }
 
+struct ovs_numa_dump *
+ovs_numa_dump_cores_with_cmask(const char *cmask)
+{
+struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+int core_id = 0;
+int end_idx;
+
+hmap_init(>dump);
+
+/* Ignore leading 0x. */
+end_idx = 0;
+if (!strncmp(cmask, "0x", 2) || !strncmp(cmask, "0X", 2)) {
+end_idx = 2;
+}
+
+for (int i = strlen(cmask) - 1; i >= end_idx; i--) {
+char hex = toupper((unsigned char) cmask[i]);
+int bin, j;
+
+if (hex >= '0' && hex <= '9') {
+bin = hex - '0';
+} else if (hex >= 'A' && hex <= 'F') {
+bin = hex - 'A' + 10;
+} else {
+VLOG_WARN("Invalid cpu mask: %c", cmask[i]);
+bin = 0;
+}
+
+for (j = 0; j < 4; j++) {
+if ((bin >> j) & 0x1) {
+struct cpu_core *core = get_core_by_core_id(core_id);
+
+if (core) {
+struct ovs_numa_info *info = xmalloc(sizeof *info);
+
+info->numa_id = core->numa->numa_id;
+info->core_id = core->core_id;
+hmap_insert(>dump, >hmap_node,
+hash_2words(info->numa_id, info->core_id));
+}
+}
+
+core_id++;
+}
+}
+
+return dump;
+}
+
+struct ovs_numa_dump *
+ovs_numa_dump_n_cores_per_numa(int cores_per_numa)
+{
+struct ovs_numa_dump *dump = xmalloc(sizeof *dump);
+const struct numa_node *n;
+
+hmap_init(>dump);
+
+HMAP_FOR_EACH (n, hmap_node, _numa_nodes) {
+const struct cpu_core *core;
+int i = 0;
+
+LIST_FOR_EACH (core, list_node, >cores) {
+if (i++ >= cores_per_numa) {
+break;
+}
+
+struct ovs_numa_info *info = xmalloc(sizeof *info);
+
+info->numa_id = core->numa->numa_id;
+info->core_id = core->core_id;
+hmap_insert(>dump, >hmap_node,
+hash_2words(info->numa_id, info->core_id));
+}
+}
+
+return dump;
+}
+
 bool
 ovs_numa_dump_contains_core(const struct ovs_numa_dump *dump,
 int numa_id, unsigned core_id)
diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h
index c0eae07d8..62bdb225f 100644
--- a/lib/ovs-numa.h
+++ b/lib/ovs-numa.h
@@ -54,12 +54,14 @@ unsigned ovs_numa_get_unpinned_core_any(void);
 unsigned ovs_numa_get_unpinned_core_on_numa(int numa_id);
 void ovs_numa_unpin_core(unsigned core_id);
 struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id);
+struct ovs_numa_dump *ovs_numa_dump_cores_with_cmask(const char *cmask);
+struct ovs_numa_dump *ovs_numa_dump_n_cores_per_numa(int n);
 bool ovs_numa_dump_contains_core(const struct ovs_numa_dump *,
  int numa_id, unsigned core_id);
 void ovs_numa_dump_destroy(struct ovs_numa_dump *);
 int ovs_numa_thread_setaffinity_core(unsigned core_id);
 
-#define FOR_EACH_CORE_ON_NUMA(ITER, DUMP)\
+#define FOR_EACH_CORE_ON_DUMP(ITER, DUMP)\
 HMAP_FOR_EACH((ITER), hmap_node, &(DUMP)->dump)
 
 #endif /* ovs-numa.h */
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 08/18] dpif-netdev: Block pmd threads if there are no ports.

2017-01-08 Thread Daniele Di Proietto
There's no reason for a pmd thread to perform its main loop if there are
no queues in its poll_list.

This commit introduces a seq object on which the pmd thread can be
blocked, if there are no queues.

When the main thread wants to reload a pmd threads it must now change
the seq object (in case it's blocked) and set 'reload' to true.

This is useful to avoid wasting CPU cycles and is also necessary for a
future commit.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 0d47a3286..dc24e72dc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -485,6 +485,8 @@ struct dp_netdev_pmd_thread {
 unsigned long long last_cycles;
 
 struct latch exit_latch;/* For terminating the pmd thread. */
+struct seq *reload_seq;
+uint64_t last_reload_seq;
 atomic_bool reload; /* Do we need to reload ports? */
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
@@ -1209,6 +1211,7 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd)
 }
 
 ovs_mutex_lock(>cond_mutex);
+seq_change(pmd->reload_seq);
 atomic_store_relaxed(>reload, true);
 ovs_mutex_cond_wait(>cond, >cond_mutex);
 ovs_mutex_unlock(>cond_mutex);
@@ -3148,6 +3151,14 @@ reload:
 netdev_rxq_get_queue_id(poll_list[i].rx));
 }
 
+if (!poll_cnt) {
+while (seq_read(pmd->reload_seq) == pmd->last_reload_seq) {
+seq_wait(pmd->reload_seq, pmd->last_reload_seq);
+poll_block();
+}
+lc = 1025;
+}
+
 for (;;) {
 for (i = 0; i < poll_cnt; i++) {
 dp_netdev_process_rxq_port(pmd, poll_list[i].port, 
poll_list[i].rx);
@@ -3223,6 +3234,7 @@ dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread 
*pmd)
 {
 ovs_mutex_lock(>cond_mutex);
 atomic_store_relaxed(>reload, false);
+pmd->last_reload_seq = seq_read(pmd->reload_seq);
 xpthread_cond_signal(>cond);
 ovs_mutex_unlock(>cond_mutex);
 }
@@ -3317,6 +3329,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 
 ovs_refcount_init(>ref_cnt);
 latch_init(>exit_latch);
+pmd->reload_seq = seq_create();
+pmd->last_reload_seq = seq_read(pmd->reload_seq);
 atomic_init(>reload, false);
 xpthread_cond_init(>cond, NULL);
 ovs_mutex_init(>cond_mutex);
@@ -3356,6 +3370,7 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
 cmap_destroy(>flow_table);
 ovs_mutex_destroy(>flow_mutex);
 latch_destroy(>exit_latch);
+seq_destroy(pmd->reload_seq);
 xpthread_cond_destroy(>cond);
 ovs_mutex_destroy(>cond_mutex);
 ovs_mutex_destroy(>port_mutex);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 07/18] dpif-netdev: Use a boolean instead of pmd->port_seq.

2017-01-08 Thread Daniele Di Proietto
There's no need for a sequence number, since the main thread has to wait
for the pmd thread, so there's no chance that an update will be
undetected.

A seq object will be introduced for another purpose in the next commit,
and changing this to boolean makes the code more readable.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 004b28dc8..0d47a3286 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -485,7 +485,7 @@ struct dp_netdev_pmd_thread {
 unsigned long long last_cycles;
 
 struct latch exit_latch;/* For terminating the pmd thread. */
-atomic_uint change_seq; /* For reloading pmd ports. */
+atomic_bool reload; /* Do we need to reload ports? */
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
@@ -526,8 +526,6 @@ struct dp_netdev_pmd_thread {
 uint64_t cycles_zero[PMD_N_CYCLES];
 };
 
-#define PMD_INITIAL_SEQ 1
-
 /* Interface to netdev-based datapath. */
 struct dpif_netdev {
 struct dpif dpif;
@@ -1201,8 +1199,6 @@ dpif_netdev_get_stats(const struct dpif *dpif, struct 
dpif_dp_stats *stats)
 static void
 dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd)
 {
-int old_seq;
-
 if (pmd->core_id == NON_PMD_CORE_ID) {
 ovs_mutex_lock(>dp->non_pmd_mutex);
 ovs_mutex_lock(>port_mutex);
@@ -1213,7 +1209,7 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd)
 }
 
 ovs_mutex_lock(>cond_mutex);
-atomic_add_relaxed(>change_seq, 1, _seq);
+atomic_store_relaxed(>reload, true);
 ovs_mutex_cond_wait(>cond, >cond_mutex);
 ovs_mutex_unlock(>cond_mutex);
 }
@@ -3131,7 +3127,6 @@ pmd_thread_main(void *f_)
 struct dp_netdev_pmd_thread *pmd = f_;
 unsigned int lc = 0;
 struct rxq_poll *poll_list;
-unsigned int port_seq = PMD_INITIAL_SEQ;
 bool exiting;
 int poll_cnt;
 int i;
@@ -3159,7 +3154,7 @@ reload:
 }
 
 if (lc++ > 1024) {
-unsigned int seq;
+bool reload;
 
 lc = 0;
 
@@ -3169,9 +3164,8 @@ reload:
 emc_cache_slow_sweep(>flow_cache);
 }
 
-atomic_read_relaxed(>change_seq, );
-if (seq != port_seq) {
-port_seq = seq;
+atomic_read_relaxed(>reload, );
+if (reload) {
 break;
 }
 }
@@ -3228,6 +3222,7 @@ static void
 dp_netdev_pmd_reload_done(struct dp_netdev_pmd_thread *pmd)
 {
 ovs_mutex_lock(>cond_mutex);
+atomic_store_relaxed(>reload, false);
 xpthread_cond_signal(>cond);
 ovs_mutex_unlock(>cond_mutex);
 }
@@ -3322,7 +3317,7 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 
 ovs_refcount_init(>ref_cnt);
 latch_init(>exit_latch);
-atomic_init(>change_seq, PMD_INITIAL_SEQ);
+atomic_init(>reload, false);
 xpthread_cond_init(>cond, NULL);
 ovs_mutex_init(>cond_mutex);
 ovs_mutex_init(>flow_mutex);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 06/18] netdev-dpdk: Refactor construct and destruct.

2017-01-08 Thread Daniele Di Proietto
Some refactoring for _construct() and _destruct() methods:
* Rename netdev_dpdk_init() to common_construct(). init() has a
  different meaning in the netdev context.
* Remove DPDK_DEV_ETH and DPDK_DEV_VHOST checks in common_construct()
  and move them to different functions
* Introduce common_destruct().
* Avoid taking 'dev->mutex' in construct and destruct: we're guaranteed
  to be the only thread with access to the object.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-dpdk.c | 86 ++-
 1 file changed, 41 insertions(+), 45 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d6315357b..45320e370 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -827,29 +827,20 @@ netdev_dpdk_alloc_txq(unsigned int n_txqs)
 }
 
 static int
-netdev_dpdk_init(struct netdev *netdev, unsigned int port_no,
- enum dpdk_dev_type type)
+common_construct(struct netdev *netdev, unsigned int port_no,
+ enum dpdk_dev_type type, int socket_id)
 OVS_REQUIRES(dpdk_mutex)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-int sid;
-int err = 0;
 
 ovs_mutex_init(>mutex);
-ovs_mutex_lock(>mutex);
 
 rte_spinlock_init(>stats_lock);
 
 /* If the 'sid' is negative, it means that the kernel fails
  * to obtain the pci numa info.  In that situation, always
  * use 'SOCKET0'. */
-if (type == DPDK_DEV_ETH && rte_eth_dev_is_valid_port(dev->port_id)) {
-sid = rte_eth_dev_socket_id(port_no);
-} else {
-sid = rte_lcore_to_socket_id(rte_get_master_lcore());
-}
-
-dev->socket_id = sid < 0 ? SOCKET0 : sid;
+dev->socket_id = socket_id < 0 ? SOCKET0 : socket_id;
 dev->requested_socket_id = dev->socket_id;
 dev->port_id = port_no;
 dev->type = type;
@@ -880,21 +871,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 
 dev->flags = NETDEV_UP | NETDEV_PROMISC;
 
-if (type == DPDK_DEV_VHOST) {
-dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
-if (!dev->tx_q) {
-err = ENOMEM;
-goto unlock;
-}
-}
-
 ovs_list_push_back(_list, >list_node);
 
 netdev_request_reconfigure(netdev);
 
-unlock:
-ovs_mutex_unlock(>mutex);
-return err;
+return 0;
 }
 
 /* dev_name must be the prefix followed by a positive decimal number.
@@ -919,6 +900,21 @@ dpdk_dev_parse_name(const char dev_name[], const char 
prefix[],
 }
 
 static int
+vhost_common_construct(struct netdev *netdev)
+OVS_REQUIRES(dpdk_mutex)
+{
+int socket_id = rte_lcore_to_socket_id(rte_get_master_lcore());
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+
+dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
+if (!dev->tx_q) {
+return ENOMEM;
+}
+
+return common_construct(netdev, -1, DPDK_DEV_VHOST, socket_id);
+}
+
+static int
 netdev_dpdk_vhost_construct(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
@@ -952,7 +948,7 @@ netdev_dpdk_vhost_construct(struct netdev *netdev)
 VLOG_INFO("Socket %s created for vhost-user port %s\n",
   dev->vhost_id, name);
 }
-err = netdev_dpdk_init(netdev, -1, DPDK_DEV_VHOST);
+err = vhost_common_construct(netdev);
 
 ovs_mutex_unlock(_mutex);
 return err;
@@ -964,7 +960,7 @@ netdev_dpdk_vhost_client_construct(struct netdev *netdev)
 int err;
 
 ovs_mutex_lock(_mutex);
-err = netdev_dpdk_init(netdev, -1, DPDK_DEV_VHOST);
+err = vhost_common_construct(netdev);
 ovs_mutex_unlock(_mutex);
 return err;
 }
@@ -975,29 +971,36 @@ netdev_dpdk_construct(struct netdev *netdev)
 int err;
 
 ovs_mutex_lock(_mutex);
-err = netdev_dpdk_init(netdev, -1, DPDK_DEV_ETH);
+err = common_construct(netdev, -1, DPDK_DEV_ETH, SOCKET0);
 ovs_mutex_unlock(_mutex);
 return err;
 }
 
 static void
+common_destruct(struct netdev_dpdk *dev)
+OVS_REQUIRES(dpdk_mutex)
+OVS_EXCLUDED(dev->mutex)
+{
+rte_free(dev->tx_q);
+dpdk_mp_put(dev->dpdk_mp);
+
+ovs_list_remove(>list_node);
+free(ovsrcu_get_protected(struct ingress_policer *,
+  >ingress_policer));
+ovs_mutex_destroy(>mutex);
+}
+
+static void
 netdev_dpdk_destruct(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 
 ovs_mutex_lock(_mutex);
-ovs_mutex_lock(>mutex);
 
 rte_eth_dev_stop(dev->port_id);
 free(dev->devargs);
-free(ovsrcu_get_protected(struct ingress_policer *,
-  >ingress_policer));
+common_destruct(dev);
 
-rte_free(dev->tx_q);
-ovs_list_remove(>list_node);
-dpdk_mp_put(dev->dpdk_mp);
-
-ovs_mutex_unlock(>mutex);
 ovs_mutex_unlock(_mutex

[ovs-dev] [PATCH v3 03/18] dpif-netdev: Don't try to output on a device without txqs.

2017-01-08 Thread Daniele Di Proietto
Tunnel devices have 0 txqs and don't support netdev_send().  While
netdev_send() simply returns EOPNOTSUPP, the XPS logic is still executed
on output, and that might be confused by devices with no txqs.

It seems better to have different structures in the fast path for ports
that support netdev_{push,pop}_header (tunnel devices), and ports that
support netdev_send.  With this we can also remove a branch in
netdev_send().

This is also necessary for a future commit, which starts DPDK devices
without txqs.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 73 +++
 lib/netdev.c  | 35 ++
 lib/netdev.h  |  1 +
 3 files changed, 73 insertions(+), 36 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index f600cab00..004b28dc8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -422,7 +422,8 @@ struct rxq_poll {
 struct ovs_list node;
 };
 
-/* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
+/* Contained by struct dp_netdev_pmd_thread's 'send_port_cache',
+ * 'tnl_port_cache' or 'tx_ports'. */
 struct tx_port {
 struct dp_netdev_port *port;
 int qid;
@@ -504,11 +505,18 @@ struct dp_netdev_pmd_thread {
  * read by the pmd thread. */
 struct hmap tx_ports OVS_GUARDED;
 
-/* Map of 'tx_port' used in the fast path. This is a thread-local copy of
- * 'tx_ports'. The instance for cpu core NON_PMD_CORE_ID can be accessed
- * by multiple threads, and thusly need to be protected by 'non_pmd_mutex'.
- * Every other instance will only be accessed by its own pmd thread. */
-struct hmap port_cache;
+/* These are thread-local copies of 'tx_ports'.  One contains only tunnel
+ * ports (that support push_tunnel/pop_tunnel), the other contains ports
+ * with at least one txq (that support send).  A port can be in both.
+ *
+ * There are two separate maps to make sure that we don't try to execute
+ * OUTPUT on a device which has 0 txqs or PUSH/POP on a non-tunnel device.
+ *
+ * The instances for cpu core NON_PMD_CORE_ID can be accessed by multiple
+ * threads, and thusly need to be protected by 'non_pmd_mutex'.  Every
+ * other instance will only be accessed by its own pmd thread. */
+struct hmap tnl_port_cache;
+struct hmap send_port_cache;
 
 /* Only a pmd thread can write on its own 'cycles' and 'stats'.
  * The main thread keeps 'stats_zero' and 'cycles_zero' as base
@@ -3058,7 +3066,10 @@ pmd_free_cached_ports(struct dp_netdev_pmd_thread *pmd)
 /* Free all used tx queue ids. */
 dpif_netdev_xps_revalidate_pmd(pmd, 0, true);
 
-HMAP_FOR_EACH_POP (tx_port_cached, node, >port_cache) {
+HMAP_FOR_EACH_POP (tx_port_cached, node, >tnl_port_cache) {
+free(tx_port_cached);
+}
+HMAP_FOR_EACH_POP (tx_port_cached, node, >send_port_cache) {
 free(tx_port_cached);
 }
 }
@@ -3072,12 +3083,21 @@ pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd)
 struct tx_port *tx_port, *tx_port_cached;
 
 pmd_free_cached_ports(pmd);
-hmap_shrink(>port_cache);
+hmap_shrink(>send_port_cache);
+hmap_shrink(>tnl_port_cache);
 
 HMAP_FOR_EACH (tx_port, node, >tx_ports) {
-tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
-hmap_insert(>port_cache, _port_cached->node,
-hash_port_no(tx_port_cached->port->port_no));
+if (netdev_has_tunnel_push_pop(tx_port->port->netdev)) {
+tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
+hmap_insert(>tnl_port_cache, _port_cached->node,
+hash_port_no(tx_port_cached->port->port_no));
+}
+
+if (netdev_n_txq(tx_port->port->netdev)) {
+tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached);
+hmap_insert(>send_port_cache, _port_cached->node,
+hash_port_no(tx_port_cached->port->port_no));
+}
 }
 }
 
@@ -3312,7 +3332,8 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, 
struct dp_netdev *dp,
 pmd->next_optimization = time_msec() + DPCLS_OPTIMIZATION_INTERVAL;
 ovs_list_init(>poll_list);
 hmap_init(>tx_ports);
-hmap_init(>port_cache);
+hmap_init(>tnl_port_cache);
+hmap_init(>send_port_cache);
 /* init the 'flow_cache' since there is no
  * actual thread created for NON_PMD_CORE_ID. */
 if (core_id == NON_PMD_CORE_ID) {
@@ -3328,7 +3349,8 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
 struct dpcls *cls;
 
 dp_netdev_pmd_flow_flush(pmd);
-hmap_destroy(>port_cache);
+hmap_destroy(>send_port_cache);
+hmap_destroy(>tnl_port_cache);
 hmap_destroy(>tx_ports);
 /* All flows (including their dpcls_rules) have been deleted already */
 

[ovs-dev] [PATCH v3 05/18] netdev-dpdk: Start also dpdkr devices only once on port-add.

2017-01-08 Thread Daniele Di Proietto
Since commit 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
we don't call rte_eth_start() from netdev_open() anymore, we only call
it from netdev_reconfigure().  This commit does that also for 'dpdkr'
devices, and remove some useless code.

Calling rte_eth_start() also from netdev_open() was unnecessary and
wasteful. Not doing it reduces code duplication and makes adding a port
faster (~900ms before the patch, ~400ms after).

Another reason why this is useful is that some DPDK driver might have
problems with reconfiguration. For example, until DPDK commit
8618d19b52b1("net/vmxnet3: reallocate shared memzone on re-config"),
vmxnet3 didn't support being restarted with a different number of
queues.

Technically, the netdev interface changed because before opening rxqs or
calling netdev_send() the user must check if reconfiguration is
required.  This patch also documents that, even though no change to the
userspace datapath (the only user) is required.

Lastly, this patch makes sure the errors returned by ofproto_port_add
(which includes the first port reconfiguration) are reported back to the
database.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-dpdk.c | 70 ---
 lib/netdev.c  |  6 -
 vswitchd/bridge.c |  2 ++
 3 files changed, 38 insertions(+), 40 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2df3e220c..d6315357b 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -746,10 +746,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 int diag;
 int n_rxq, n_txq;
 
-if (!rte_eth_dev_is_valid_port(dev->port_id)) {
-return ENODEV;
-}
-
 rte_eth_dev_info_get(dev->port_id, );
 
 n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
@@ -858,30 +854,23 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 dev->port_id = port_no;
 dev->type = type;
 dev->flags = 0;
-dev->requested_mtu = dev->mtu = ETHER_MTU;
+dev->requested_mtu = ETHER_MTU;
 dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
 ovsrcu_index_init(>vid, -1);
 dev->vhost_reconfigured = false;
 
-err = netdev_dpdk_mempool_configure(dev);
-if (err) {
-goto unlock;
-}
-
 ovsrcu_init(>qos_conf, NULL);
 
 ovsrcu_init(>ingress_policer, NULL);
 dev->policer_rate = 0;
 dev->policer_burst = 0;
 
-netdev->n_rxq = NR_QUEUE;
-netdev->n_txq = NR_QUEUE;
-dev->requested_n_rxq = netdev->n_rxq;
-dev->requested_n_txq = netdev->n_txq;
-dev->rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE;
-dev->txq_size = NIC_PORT_DEFAULT_TXQ_SIZE;
-dev->requested_rxq_size = dev->rxq_size;
-dev->requested_txq_size = dev->txq_size;
+netdev->n_rxq = 0;
+netdev->n_txq = 0;
+dev->requested_n_rxq = NR_QUEUE;
+dev->requested_n_txq = NR_QUEUE;
+dev->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE;
+dev->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE;
 
 /* Initialize the flow control to NULL */
 memset(>fc_conf, 0, sizeof dev->fc_conf);
@@ -891,25 +880,18 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 
 dev->flags = NETDEV_UP | NETDEV_PROMISC;
 
-if (type == DPDK_DEV_ETH) {
-if (rte_eth_dev_is_valid_port(dev->port_id)) {
-err = dpdk_eth_dev_init(dev);
-if (err) {
-goto unlock;
-}
-}
-dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
-} else {
+if (type == DPDK_DEV_VHOST) {
 dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
-}
-
-if (!dev->tx_q) {
-err = ENOMEM;
-goto unlock;
+if (!dev->tx_q) {
+err = ENOMEM;
+goto unlock;
+}
 }
 
 ovs_list_push_back(_list, >list_node);
 
+netdev_request_reconfigure(netdev);
+
 unlock:
 ovs_mutex_unlock(>mutex);
 return err;
@@ -3168,7 +3150,7 @@ out:
 return err;
 }
 
-static void
+static int
 dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
 OVS_REQUIRES(dev->mutex)
 {
@@ -3189,32 +3171,38 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev)
 }
 }
 
+if (!dev->dpdk_mp) {
+return ENOMEM;
+}
+
 if (netdev_dpdk_get_vid(dev) >= 0) {
 dev->vhost_reconfigured = true;
 }
+
+return 0;
 }
 
 static int
 netdev_dpdk_vhost_reconfigure(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+int err;
 
 ovs_mutex_lock(>mutex);
-dpdk_vhost_reconfigure_helper(dev);
+err = dpdk_vhost_reconfigure_helper(dev);
 ovs_mutex_unlock(>mutex);
-return 0;
+
+return err;
 }
 
 static int
 netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
 {
 struct netdev_dpdk *dev = 

[ovs-dev] [PATCH v3 04/18] netdev-dpdk: Don't call rte_dev_stop() in update_flags().

2017-01-08 Thread Daniele Di Proietto
Calling rte_eth_dev_stop() while the device is running causes a crash.

We could use rte_eth_dev_set_link_down(), but not every PMD implements
that, and I found one NIC where that has no effect.

Instead, this commit checks if the device has the NETDEV_UP flag when
transmitting or receiving (similarly to what we do for vhostuser). I
didn't notice any performance difference with this check in case the
device is up.

An alternative would be to remove the device queues from the pmd threads
tx and receive cache, but that requires reconfiguration and I'd prefer
to avoid it, because the change can come from OpenFlow.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-dpdk.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8bb908691..2df3e220c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -783,8 +783,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 mbp_priv = rte_mempool_get_priv(dev->dpdk_mp->mp);
 dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM;
 
-dev->flags = NETDEV_UP | NETDEV_PROMISC;
-
 /* Get the Flow control configuration for DPDK-ETH */
 diag = rte_eth_dev_flow_ctrl_get(dev->port_id, >fc_conf);
 if (diag) {
@@ -890,6 +888,9 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 
 /* Initilize the hardware offload flags to 0 */
 dev->hw_ol_features = 0;
+
+dev->flags = NETDEV_UP | NETDEV_PROMISC;
+
 if (type == DPDK_DEV_ETH) {
 if (rte_eth_dev_is_valid_port(dev->port_id)) {
 err = dpdk_eth_dev_init(dev);
@@ -900,8 +901,6 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq);
 } else {
 dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM);
-/* Enable DPDK_DEV_VHOST device and set promiscuous mode flag. */
-dev->flags = NETDEV_UP | NETDEV_PROMISC;
 }
 
 if (!dev->tx_q) {
@@ -1591,6 +1590,10 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct 
dp_packet_batch *batch)
 int nb_rx;
 int dropped = 0;
 
+if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
+return EAGAIN;
+}
+
 nb_rx = rte_eth_rx_burst(rx->port_id, rxq->queue_id,
  (struct rte_mbuf **) batch->packets,
  NETDEV_MAX_BURST);
@@ -1821,6 +1824,11 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
struct dp_packet_batch *batch, bool may_steal,
bool concurrent_txq)
 {
+if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
+dp_packet_delete_batch(batch, may_steal);
+return;
+}
+
 if (OVS_UNLIKELY(concurrent_txq)) {
 qid = qid % dev->up.n_txq;
 rte_spinlock_lock(>tx_q[qid].tx_lock);
@@ -2285,8 +2293,6 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
enum netdev_flags *old_flagsp)
 OVS_REQUIRES(dev->mutex)
 {
-int err;
-
 if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) {
 return EINVAL;
 }
@@ -2300,20 +2306,10 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
 }
 
 if (dev->type == DPDK_DEV_ETH) {
-if (dev->flags & NETDEV_UP) {
-err = rte_eth_dev_start(dev->port_id);
-if (err)
-return -err;
-}
-
 if (dev->flags & NETDEV_PROMISC) {
 rte_eth_promiscuous_enable(dev->port_id);
 }
 
-if (!(dev->flags & NETDEV_UP)) {
-rte_eth_dev_stop(dev->port_id);
-}
-
 netdev_change_seq_changed(>up);
 } else {
 /* If DPDK_DEV_VHOST device's NETDEV_UP flag was changed and vhost is
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 01/18] dpif-netdev: Fix memory leak.

2017-01-08 Thread Daniele Di Proietto
We keep all the per-port classifiers around, since they can be reused,
but when a pmd thread is destroyed we should free them.

Found using valgrind.

Fixes: 3453b4d62a98("dpif-netdev: dpcls per in_port with sorted
subtables")

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d1f9661a2..9003f703d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -,6 +,7 @@ dp_netdev_destroy_pmd(struct dp_netdev_pmd_thread *pmd)
 /* All flows (including their dpcls_rules) have been deleted already */
 CMAP_FOR_EACH (cls, node, >classifiers) {
 dpcls_destroy(cls);
+ovsrcu_postpone(free, cls);
 }
 cmap_destroy(>classifiers);
 cmap_destroy(>flow_table);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 02/18] dpif-netdev: Take non_pmd_mutex to access tx cached ports.

2017-01-08 Thread Daniele Di Proietto
As documented in dp_netdev_pmd_thread, we must take non_pmd_mutex to
access the tx port caches for the non pmd thread.

Found by inspection.

Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/dpif-netdev.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9003f703d..f600cab00 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3353,8 +3353,10 @@ dp_netdev_del_pmd(struct dp_netdev *dp, struct 
dp_netdev_pmd_thread *pmd)
 /* NON_PMD_CORE_ID doesn't have a thread, so we don't have to synchronize,
  * but extra cleanup is necessary */
 if (pmd->core_id == NON_PMD_CORE_ID) {
+ovs_mutex_lock(>non_pmd_mutex);
 emc_cache_uninit(>flow_cache);
 pmd_free_cached_ports(pmd);
+ovs_mutex_unlock(>non_pmd_mutex);
 } else {
 latch_set(>exit_latch);
 dp_netdev_reload_pmd__(pmd);
-- 
2.11.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] netdev: Add 'errp' to set_config().

2017-01-06 Thread Daniele Di Proietto
Since 55e075e65ef9("netdev-dpdk: Arbitrary 'dpdk' port naming"),
set_config() is used to identify a DPDK device, so it's better to report
its detailed error message to the user.  Tunnel devices and patch ports
rely a lot on set_config() as well.

This commit adds a param to set_config() that can be used to return
an error message and makes use of that in netdev-dpdk and netdev-vport.

Before this patch:

$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': dpdk0: could not set 
configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: could not set 
configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: could not set 
configuration (Invalid argument).  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

After this patch:

$ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl: Error detected while setting up 'dpdk0': 'dpdk0' is missing 
'options:dpdk-devargs'. The old 'dpdk' names are not supported.  See 
ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 p+ -- set Interface p+ type=patch
ovs-vsctl: Error detected while setting up 'p+': p+: patch type requires valid 
'peer' argument.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

$ ovs-vsctl add-port br0 gnv0 -- set Interface gnv0 type=geneve
ovs-vsctl: Error detected while setting up 'gnv0': gnv0: geneve type requires 
valid 'remote_ip' argument.  See ovs-vswitchd log for details.
ovs-vsctl: The default log directory is "/var/log/openvswitch/".

CC: Ciara Loftus <ciara.lof...@intel.com>
CC: Kevin Traynor <ktray...@redhat.com>
Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
---
 lib/netdev-dpdk.c | 27 ++
 lib/netdev-dummy.c|  3 +-
 lib/netdev-provider.h |  9 --
 lib/netdev-vport.c| 76 ++-
 lib/netdev.c  | 10 +--
 5 files changed, 84 insertions(+), 41 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0f02c4d74..1bcc27a62 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1132,7 +1132,7 @@ netdev_dpdk_lookup_by_port_id(int port_id)
 }
 
 static int
-netdev_dpdk_process_devargs(const char *devargs)
+netdev_dpdk_process_devargs(const char *devargs, char **errp)
 {
 uint8_t new_port_id = UINT8_MAX;
 
@@ -1145,7 +1145,7 @@ netdev_dpdk_process_devargs(const char *devargs)
 VLOG_INFO("Device '%s' attached to DPDK", devargs);
 } else {
 /* Attach unsuccessful */
-VLOG_INFO("Error attaching device '%s' to DPDK", devargs);
+VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", 
devargs);
 return -1;
 }
 }
@@ -1184,7 +1184,8 @@ dpdk_process_queue_size(struct netdev *netdev, const 
struct smap *args,
 }
 
 static int
-netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
+netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
+   char **errp)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 bool rx_fc_en, tx_fc_en, autoneg;
@@ -1225,7 +1226,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args)
  * is valid */
 if (!(dev->devargs && !strcmp(dev->devargs, new_devargs)
&& rte_eth_dev_is_valid_port(dev->port_id))) {
-int new_port_id = netdev_dpdk_process_devargs(new_devargs);
+int new_port_id = netdev_dpdk_process_devargs(new_devargs, errp);
 if (!rte_eth_dev_is_valid_port(new_port_id)) {
 err = EINVAL;
 } else if (new_port_id == dev->port_id) {
@@ -1235,10 +1236,10 @@ netdev_dpdk_set_config(struct netdev *netdev, const 
struct smap *args)
 struct netdev_dpdk *dup_dev;
 dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
 if (dup_dev) {
-VLOG_WARN("'%s' is trying to use device '%s' which is "
-  "already in use by '%s'.",
-  netdev_get_name(netdev), new_devargs,
-  netdev_get_name(_dev->up));
+VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
+

Re: [ovs-dev] [PATCH v3] dpif: Return ENODEV from dpif_port_query_by_*() if there's no port.

2017-01-06 Thread Daniele Di Proietto





On 06/01/2017 13:01, "Ben Pfaff" <b...@ovn.org> wrote:

>On Fri, Jan 06, 2017 at 12:42:35PM -0800, Daniele Di Proietto wrote:
>> bridge_delete_or_reconfigure() deletes every interface that's not dumped
>> by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
>> OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
>> calling port_query_by_name().  If port_query_by_name() returns an error,
>> the dump is interrupted.  If port_query_by_name() returns ENODEV, the
>> device doesn't exist and the dump can continue.
>> 
>> port_query_by_name() for the userspace datapath returns ENOENT instead
>> of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
>> handled correctly by port_dump_next().
>> 
>> dpif-netdev handles reconfiguration errors for an interface by deleting
>> it from the datapath, so it's possible that a device is missing. When this
>> happens we must make sure that port_dump_next() continues to dump other
>> devices, otherwise they will be deleted and the two layers will have an
>> inconsistent view.
>> 
>> This commit fixes the problem by returning ENODEV from the userspace
>> datapath if the port doesn't exist, and by documenting this clearly in
>> the dpif interfaces.
>> 
>> The problem was found while developing new code.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> ---
>> v3: Return ENODEV instead of ENOENT from dpif-netdev. Document that ENODEV
>> means that the port doesn't exist, other error numbers indicate problems.
>
>Acked-by: Ben Pfaff <b...@ovn.org>

Thanks!  Pushed to master
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] dpif: Return ENODEV from dpif_port_query_by_name() if there's no port.

2017-01-06 Thread Daniele Di Proietto





On 06/01/2017 11:34, "Ben Pfaff" <b...@ovn.org> wrote:

>On Fri, Jan 06, 2017 at 10:59:07AM -0800, Daniele Di Proietto wrote:
>> bridge_delete_or_reconfigure() deletes every interface that's not dumped
>> by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
>> OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
>> calling port_query_by_name().  If port_query_by_name() returns an error,
>> the dump is interrupted.  If port_query_by_name() returns ENODEV, the
>> device doesn't exist and the dump can continue.
>> 
>> port_query_by_name() for the userspace datapath returns ENOENT instead
>> of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
>> handled correctly by port_dump_next().
>
>Should port_query_by_name() for the userspace datapath return ENODEV,
>instead?

Sorry to waste your time on this.  Yes, that seems the more appropriate 
solution.

I decided to handle both ENODEV and ENOENT to be consistent with what we did in
the past, e.g bee6b8bc16b1("dpif: Don't log warning for ENOENT with
dpif_port_exists().").

I suspected that ENOENT could only come from the userspace datapath, but I 
wasn't
too sure about that.

After looking at vport_cmd_get() and testing it I couldn't find any ENOENT, so,
probably, we added all those special cases just for the userspace datapath.

How about the following v3?

https://mail.openvswitch.org/pipermail/ovs-dev/2017-January/327323.html

Thanks,

Daniele

>
>> This commit fixes the problem by translating ENOENT in ENODEV in
>> dpif_port_query_by_name().
>> 
>> dpif-netdev handles reconfiguration errors for an interface by deleting
>> it from the datapath, so it's possible that a device is missing. When this
>> happens we must make sure that port_dump_next() continues to dump other
>> devices, otherwise they will be deleted and the two layers will have an
>> inconsistent view.
>> 
>> The problem was found while developing new code.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>> ---
>> v2:
>> * Translate ENOENT into ENODEV in dpif_port_query_by_name(), instead of
>>   handling both in port_dump_next().
>
>Acked-by: Ben Pfaff <b...@ovn.org>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto-dpif: Continue port dump if a port is missing from dpif-netdev.

2017-01-06 Thread Daniele Di Proietto





On 06/01/2017 09:28, "Ben Pfaff" <b...@ovn.org> wrote:

>On Thu, Jan 05, 2017 at 08:37:26PM -0800, Daniele Di Proietto wrote:
>> bridge_delete_or_reconfigure() deletes every interface that's not dumped
>> by OFPROTO_PORT_FOR_EACH().  ofproto_dpif.c:port_dump_next(), used by
>> OFPROTO_PORT_FOR_EACH, checks if the ofport is in the datapath by
>> calling port_query_by_name().  If port_query_by_name() returns an error,
>> the dump is interrupted.  If port_query_by_name() returns ENODEV, the
>> device doesn't exist and the dump can continue.
>> 
>> port_query_by_name() for the userspace datapath returns ENOENT instead
>> of ENODEV.  This is expected by dpif_port_query_by_name(), but it's not
>> handled correctly by port_dump_next().
>> 
>> This commit fixes the problem by handling ENOENT like ENODEV.
>> 
>> dpif-netdev handles reconfiguration errors for an interface by deleting
>> it from the datapath, so it's possible that a device is missing. When this
>> happens we must make sure that port_dump_next() continues to dump other
>> devices otherwise they will be deleted and the two layers will have an
>> inconsistent view.
>> 
>> The problem was found while developing new code.
>> 
>> Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com>
>
>I'm not sure whether there's a difference in meaning between ENOENT and
>ENODEV when it comes from these functions.  I wonder whether the dpif
>layer should translate one of them into the other, for callers'
>convenience.

Good idea, let me send a v2.

Thanks,

Daniele

>
>Acked-by: Ben Pfaff <b...@ovn.org>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   3   4   5   6   7   8   9   10   >