from:"王志克"

Re: [ovs-dev] [来自外部的邮件]Re: [External] Re:[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-23 Thread 王志克

Hi Peng,

Right, I also met this issue, and wondering the sequence for this 
inconsistence, and would like to hear your “new cause”.

Anyway I believe your below patch should fix this issue.
[ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops

Br,
Zhike

From: ".贺鹏" 
Date: Friday, September 23, 2022 at 8:59 PM
To: 王志克 
Cc: "ovs-dev@openvswitch.org" , "d...@openvswitch.org" 

Subject: [来自外部的邮件]Re: [External] Re:[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix 
inconsistent processing between ukey and megaflow

京东安全提示：此封邮件来自公司外部，除非您能判断发件人和知道邮件内容安全，否则请勿打开链接或者附件。
JD Security Tips: Please do not click on links or open attachments unless you 
trust the sender and know the content is safe.


Hi, Zhike,

After receiving your email, I was becoming curious about this code and did more 
investigation on it.

and I found some problems with the code and now I believe this inconsistent 
processing is NOT the root cause for the inconsistent actions between ukey and 
datapath.
and I found a new cause for that, but due to this complex race between PMD and 
revalidator, I wish this time I am right.

But before that, why are you interested in this patch? Have you found the same 
issue in your environment?




On Thu, Sep 22, 2022 at 6:54 PM .贺鹏 
mailto:hepeng.0...@bytedance.com>> wrote:
Hi, zhike,

It's difficult to give a very clear sequences about how this inconsistency 
happens, but I can give you more details.

This is observed in our production environment. The correct megaflow should 
encap packets with vxlan header and send out, but the action is drop.
This is usually because the neigh info is not available at the moment when the 
upcall happens.

Normally, the drop action is ephemeral, and reavalidator will later modify the 
megaflow's action into the tnl_push.

But there are a few of cases, only happened 1~2 times in a year, where the drop 
actions will never be replaced by tnl_push.

just like in the commits mentioned,

"The coverage command shows revalidators have dumped several times,
however the correct actions are not set. This implies that the ukey's
action does not equal to the meagaflow's, i.e. revalidators think the underlying

megaflow's actions are correct however they are not."



I do not know how this happened, but I do think this inconsistent processing 
could be one of the reasons.

Even there is no such bug, I think keeping processing inconsistent is necessary.





On Wed, Sep 21, 2022 at 5:57 PM 王志克 mailto:wangzh...@jd.com>> 
wrote:
Hi Hepeng,

Can you please explain the sequence that how this inconsistence could happen? 
Why you believe the current actions in existing netdev_flow is old?

Thanks.

Br,
wangzhike




*
[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between ukey 
and megaflow
Message ID

20220604151857.66550-4-hepeng.0...@bytedance.com<mailto:20220604151857.66550-4-hepeng.0...@bytedance.com>

State

New

Headers

show

Series

[ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops 
<http://patchwork.ozlabs.org/project/openvswitch/list/?series=303324> | expand

Checks
Context

Check

Description

ovsrobot/apply-robot

warning

apply and check: 
warning<https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022431.html>

ovsrobot/github-robot-_Build_and_Test

success

github build: 
passed<https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022436.html>

ovsrobot/intel-ovs-compilation

success

test: 
success<https://mail.openvswitch.org/pipermail/ovs-build/2022-June/022439.html>

Commit Message
Peng 
He<http://patchwork.ozlabs.org/project/openvswitch/list/?submitter=78087>June 
4, 2022, 3:18 p.m. UTC

When PMDs perform upcalls, the newly generated ukey will replace

the old, however, the newly generated mageflow will be discard

to reuse the old one without checking if the actions of new and

old are equal.



We observe in the production environment that sometimes a megaflow

with wrong actions keep staying in datapath. The coverage command shows

revalidators have dumped serveral times, however the correct

actions are not set. This implies that the ukey's action does not

equal to the meagaflow's, i.e. revalidators think the underlying

megaflow's actions are correct however they are not.



We also check the megaflow using the ofproto/trace command, and the

actions are not matched with the ones in the actual magaflow. By

performing a revalidator/purge command, the right actions are set.



Signed-off-by: Peng He 
mailto:hepeng.0...@bytedance.com>>

---

 lib/dpif-netdev.c | 17 -

 1 file changed, 16 insertions(+), 1 deletion(-)

Comments
0-day 
Robot<http://patchwork.ozlabs.org/project/openvswitch/list/?submitter=74326>June
 4, 2022, 3:44 p.m. UTC | #1<http://patchwork.ozlabs.org/comment/290

Re: [ovs-dev] [ovs-dev, ovs-dev, v2, 4/4] dpif-netdev: fix inconsistent processing between ukey and megaflow

2022-09-21 Thread 王志克

Hi Hepeng,

Can you please explain the sequence that how this inconsistence could happen? 
Why you believe the current actions in existing netdev_flow is old?

Thanks.

Br,
wangzhike




*
[ovs-dev,ovs-dev,v2,4/4] dpif-netdev: fix inconsistent processing between ukey 
and megaflow
Message ID

20220604151857.66550-4-hepeng.0...@bytedance.com

State

New

Headers

show

Series

[ovs-dev,ovs-dev,v2,1/4] ofproto-dpif-upcall: fix push_dp_ops 
 | 
expand

Checks
Context

Check

Description

ovsrobot/apply-robot

warning

apply and check: 
warning

ovsrobot/github-robot-_Build_and_Test

success

github build: 
passed

ovsrobot/intel-ovs-compilation

success

test: 
success

Commit Message
Peng 
HeJune 
4, 2022, 3:18 p.m. UTC

When PMDs perform upcalls, the newly generated ukey will replace

the old, however, the newly generated mageflow will be discard

to reuse the old one without checking if the actions of new and

old are equal.



We observe in the production environment that sometimes a megaflow

with wrong actions keep staying in datapath. The coverage command shows

revalidators have dumped serveral times, however the correct

actions are not set. This implies that the ukey's action does not

equal to the meagaflow's, i.e. revalidators think the underlying

megaflow's actions are correct however they are not.



We also check the megaflow using the ofproto/trace command, and the

actions are not matched with the ones in the actual magaflow. By

performing a revalidator/purge command, the right actions are set.



Signed-off-by: Peng He 

---

 lib/dpif-netdev.c | 17 -

 1 file changed, 16 insertions(+), 1 deletion(-)

Comments
0-day 
RobotJune
 4, 2022, 3:44 p.m. UTC | #1

Bleep bloop.  Greetings Peng He, I am a robot and I have tried out your patch.

Thanks for your contribution.



I encountered some error that I wasn't expecting.  See the details below.





checkpatch:

ERROR: Author Peng He  needs to sign off.

WARNING: Unexpected sign-offs from developers who are not authors or co-authors 
or committers: Peng He 

Lines checked: 58, Warnings: 1, Errors: 1





Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com



Thanks,

0-day Robot
1638948diffmboxseries
Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

index ff57b3961..985c25c58 100644

--- a/lib/dpif-netdev.c

+++ b/lib/dpif-netdev.c

@@ -8305,7 +8305,22 @@  handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,

  * to be locking revalidators out of making flow modifications. */

 ovs_mutex_lock(>flow_mutex);

 netdev_flow = dp_netdev_pmd_lookup_flow(pmd, key, NULL);

-if (OVS_LIKELY(!netdev_flow)) {

+if (OVS_UNLIKELY(netdev_flow)) {

+struct dp_netdev_actions *old_act =

+dp_netdev_flow_get_actions(netdev_flow);

+

+if ((add_actions->size != old_act->size) ||

+memcmp(old_act->actions, add_actions->data,

+ add_actions->size)) {

+

+   struct dp_netdev_actions *new_act =

+   dp_netdev_actions_create(add_actions->data,

+add_actions->size);

+

+   ovsrcu_set(_flow->actions, new_act);

+   ovsrcu_postpone(dp_netdev_actions_free, old_act);

+}

+} else {

 netdev_flow = dp_netdev_flow_add(pmd, , ,

  add_actions->data,

  add_actions->size, orig_in_port);


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] vhost: return -EAGAIN during unregistering vhost if it is busy.

2020-05-05 Thread 王志克

NO, it is different issue.


The current  deadlock  mentioned in this patch is caused by some blocking 
function (like ovsrcu_synchronize) in application (like OVS).
In this patch, the application is needed  to break  the logical deadlock. 

















At 2020-04-27 16:09:31, "Maxime Coquelin"  wrote:
>
>
>On 3/18/20 4:31 AM, 王志克 wrote:
>> Involve openvswitch group since this fix is highly coupled with OVS.
>> welcome comment.
>> At 2020-03-12 17:57:19, "Zhike Wang"  wrote:
>>> The vhost_user_read_cb() and rte_vhost_driver_unregister()
>>> can be called at the same time by 2 threads, and may lead to deadlock.
>>> Eg thread1 calls 
>>> vhost_user_read_cb()->vhost_user_get_vring_base()->destroy_device(),
>>> then thread2 calls rte_vhost_driver_unregister(), and will retry the 
>>> fdset_try_del() in loop.
>>>
>>> Some application implements destroy_device() as a blocking function, eg
>>> OVS calls ovsrcu_synchronize() insides destroy_device(). As a result,
>>> thread1(eg vhost_events) is blocked to wait quiesce of thread2(eg 
>>> ovs-vswitchd),
>>> and thread2 is in a loop to wait thread1 to give up the use of the vhost fd,
>>> then leads to deadlock.
>>>
>>> It is better to return -EAGAIN to application, who will decide how to handle
>>> (eg OVS can call ovsrcu_quiesce() and then retry).
>>>
>>> Signed-off-by: Zhike Wang 
>>> ---
>>> lib/librte_vhost/rte_vhost.h | 4 +++-
>>> lib/librte_vhost/socket.c| 8 
>>> 2 files changed, 7 insertions(+), 5 deletions(-)
>
>
>Isn't it fixed with below commit that landed into DPDK v20.02?
>
>commit 5efb18e85f7fdb436d3e56591656051c16802066
>Author: Maxime Coquelin 
>Date:   Tue Jan 14 19:53:57 2020 +0100
>
>vhost: fix deadlock on port deletion
>
>If the vhost-user application (e.g. OVS) deletes the vhost-user
>port while Qemu sends a vhost-user request, a deadlock can
>happen if the request handler tries to acquire vhost-user's
>global mutex, which is also locked by the vhost-user port
>deletion API (rte_vhost_driver_unregister).
>
>This patch prevents the deadlock by making
>rte_vhost_driver_unregister() to release the mutex and try
>again if a request is being handled to give a chance to
>the request handler to complete.
>
>Fixes: 8b4b949144b8 ("vhost: fix dead lock on closing in server mode")
>Fixes: 5fbb3941da9f ("vhost: introduce driver features related APIs")
>Cc: sta...@dpdk.org
>
>Signed-off-by: Maxime Coquelin 
>Reviewed-by: Tiwei Bie 
>Acked-by: Eelco Chaudron 
>
>>> diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
>>> index c7b619a..276db11 100644
>>> --- a/lib/librte_vhost/rte_vhost.h
>>> +++ b/lib/librte_vhost/rte_vhost.h
>>> @@ -389,7 +389,9 @@ void rte_vhost_log_used_vring(int vid, uint16_t 
>>> vring_idx,
>>>  */
>>> int rte_vhost_driver_register(const char *path, uint64_t flags);
>>>
>>> -/* Unregister vhost driver. This is only meaningful to vhost user. */
>>> +/* Unregister vhost driver. This is only meaningful to vhost user.
>>> + * Return -EAGAIN if device is busy, and leave it to be handled by 
>>> application.
>>> + */
>>> int rte_vhost_driver_unregister(const char *path);
>>>
>>> /**
>>> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
>>> index 7c80121..a75a3f6 100644
>>> --- a/lib/librte_vhost/socket.c
>>> +++ b/lib/librte_vhost/socket.c
>>> @@ -1027,7 +1027,8 @@ struct vhost_user_reconnect_list {
>>> }
>>>
>>> /**
>>> - * Unregister the specified vhost socket
>>> + * Unregister the specified vhost socket.
>>> + * Return -EAGAIN if device is busy, and leave it to be handled by 
>>> application.
>>>  */
>>> int
>>> rte_vhost_driver_unregister(const char *path)
>>> @@ -1039,7 +1040,6 @@ struct vhost_user_reconnect_list {
>>> if (path == NULL)
>>> return -1;
>>>
>>> -again:
>>> pthread_mutex_lock(_user.mutex);
>>>
>>> for (i = 0; i < vhost_user.vsocket_cnt; i++) {
>>> @@ -1063,7 +1063,7 @@ struct vhost_user_reconnect_list {
>>> pthread_mutex_unlock(
>>> >conn_mutex);
>>> pthread_mutex_unlock(_user.mutex);
>>> -

Re: [ovs-dev] [PATCH] vhost: return -EAGAIN during unregistering vhost if it is busy.

2020-03-17 Thread 王志克

Involve openvswitch group since this fix is highly coupled with OVS.
welcome comment.
At 2020-03-12 17:57:19, "Zhike Wang"  wrote:
>The vhost_user_read_cb() and rte_vhost_driver_unregister()
>can be called at the same time by 2 threads, and may lead to deadlock.
>Eg thread1 calls 
>vhost_user_read_cb()->vhost_user_get_vring_base()->destroy_device(),
>then thread2 calls rte_vhost_driver_unregister(), and will retry the 
>fdset_try_del() in loop.
>
>Some application implements destroy_device() as a blocking function, eg
>OVS calls ovsrcu_synchronize() insides destroy_device(). As a result,
>thread1(eg vhost_events) is blocked to wait quiesce of thread2(eg 
>ovs-vswitchd),
>and thread2 is in a loop to wait thread1 to give up the use of the vhost fd,
>then leads to deadlock.
>
>It is better to return -EAGAIN to application, who will decide how to handle
>(eg OVS can call ovsrcu_quiesce() and then retry).
>
>Signed-off-by: Zhike Wang 
>---
> lib/librte_vhost/rte_vhost.h | 4 +++-
> lib/librte_vhost/socket.c| 8 
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
>diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h
>index c7b619a..276db11 100644
>--- a/lib/librte_vhost/rte_vhost.h
>+++ b/lib/librte_vhost/rte_vhost.h
>@@ -389,7 +389,9 @@ void rte_vhost_log_used_vring(int vid, uint16_t vring_idx,
>  */
> int rte_vhost_driver_register(const char *path, uint64_t flags);
> 
>-/* Unregister vhost driver. This is only meaningful to vhost user. */
>+/* Unregister vhost driver. This is only meaningful to vhost user.
>+ * Return -EAGAIN if device is busy, and leave it to be handled by 
>application.
>+ */
> int rte_vhost_driver_unregister(const char *path);
> 
> /**
>diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
>index 7c80121..a75a3f6 100644
>--- a/lib/librte_vhost/socket.c
>+++ b/lib/librte_vhost/socket.c
>@@ -1027,7 +1027,8 @@ struct vhost_user_reconnect_list {
> }
> 
> /**
>- * Unregister the specified vhost socket
>+ * Unregister the specified vhost socket.
>+ * Return -EAGAIN if device is busy, and leave it to be handled by 
>application.
>  */
> int
> rte_vhost_driver_unregister(const char *path)
>@@ -1039,7 +1040,6 @@ struct vhost_user_reconnect_list {
>   if (path == NULL)
>   return -1;
> 
>-again:
>   pthread_mutex_lock(_user.mutex);
> 
>   for (i = 0; i < vhost_user.vsocket_cnt; i++) {
>@@ -1063,7 +1063,7 @@ struct vhost_user_reconnect_list {
>   pthread_mutex_unlock(
>   >conn_mutex);
>   pthread_mutex_unlock(_user.mutex);
>-  goto again;
>+  return -EAGAIN;
>   }
> 
>   VHOST_LOG_CONFIG(INFO,
>@@ -1085,7 +1085,7 @@ struct vhost_user_reconnect_list {
>   if (fdset_try_del(_user.fdset,
>   vsocket->socket_fd) == -1) {
>   pthread_mutex_unlock(_user.mutex);
>-  goto again;
>+  return -EAGAIN;
>   }
> 
>   close(vsocket->socket_fd);
>-- 
>1.8.3.1
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC]netdev-offload-dpdk: merged single table for HW offloading

2019-12-30 Thread 王志克 via dev

We would like to introduce a HW offloading solution for scenario that one 
packet goes through DPDK OVS pipeline multiple times with recircle action. We 
call it merged-single-table HW offloading.

The standard use case is to support conntrack with HW offloading. Example, the 
packet matches flow 1 with action CT and recircle, then it matches flow 2 with 
action forward to certain port. Meaning the packet needs to go through multiple 
SW logic tables. When desiging such HW offloading, it would be straightforward 
to have 1to1 mapping for HW tables and SW logic tables. 

However there are some difficulty to do so:
1) Not all NIC support multiple HW tables.
2) When a lookup miss happens in non-1st table, it may be hard for CPU to 
continously handle this packet since the packet may already be changed in 
previous tables, like IP address change.
3) HW is hard to have Finite State Machine, eg to maintain conntrack state.
4) SW may introduce new actions that current HW can not support, like dp_hash.

Our solution is to create only one single match/action flow for HW. The detail 
is
1) When packet goes through the SW logic tables (miniflow/SFC/Megaflows and 
so), it will record all mega flows it hits and some metadata during recircle. 
Some validation is needed to check the match/action for each flow. If 
validation failes, the record will be stopped. One special validation is that 
we only offload traffic with CT state EST.
2) When the packet hits one mega flow that has forward/drop action and no 
recircle action, all flows info and metadata will be sent to an offloading 
thread.
3) The offloading thread will merge the flows and actions into one single 
match/action flow, and offload this merged flow to HW. During the merging, the 
match may be expanded with extra items, like for CT case, we will expand 5 
tuples into matchers.
4) Another aging thread is created to keep synchronization between HW and SW. 
It will periodically fetch stats from HW for offloaded flows, and contributes 
to megaflows' stats and Conntrack's stats. If some megaflow is dead, or the 
conntrack entry is aged out, the aging thread will also delete the merged rule 
from HW.

Using this solution, we keep the HW simple (meaning less coupling/dependency on 
HW), and maintain the flexibility in SW.

The solution is already massively deployed in our product environment, and 
works reliably.

We would like to get some feedback before we submit all patches.

Br,

Zhike Wang 
JDCloud, Product Development, IaaS   

Mobile／+86 13466719566
E- mail／wangzh...@jd.com
Address／5F Building A,North-Star Century Center,8 Beichen West Street,Chaoyang 
District Beijing
Https://JDCloud.com




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] [RFC]netdev-offload-dpdk: merged single table for HW offloading

2019-12-30 Thread 王志克 via dev

We would like to introduce a HW offloading solution for scenario that one 
packet goes through DPDK OVS pipeline multiple times with recircle action. We 
call it merged-single-table HW offloading.

The standard use case is to support conntrack with HW offloading. Example, the 
packet matches flow 1 with action CT and recircle, then it matches flow 2 with 
action forward to certain port. Meaning the packet needs to go through multiple 
SW logic tables. When desiging such HW offloading, it would be straightforward 
to have 1to1 mapping for HW tables and SW logic tables. 

However there are some difficulty to do so:
1) Not all NIC support multiple HW tables.
2) When a lookup miss happens in non-1st table, it may be hard for CPU to 
continously handle this packet since the packet may already be changed in 
previous tables, like IP address change.
3) HW is hard to have Finite State Machine, eg to maintain conntrack state.
4) SW may introduce new actions that current HW can not support, like dp_hash.

Our solution is to create only one single match/action flow for HW. The detail 
is
1) When packet goes through the SW logic tables (miniflow/SFC/Megaflows and 
so), it will record all mega flows it hits and some metadata during recircle. 
Some validation is needed to check the match/action for each flow. If 
validation failes, the record will be stopped. One special validation is that 
we only offload traffic with CT state EST.
2) When the packet hits one mega flow that has forward/drop action and no 
recircle action, all flows info and metadata will be sent to an offloading 
thread.
3) The offloading thread will merge the flows and actions into one single 
match/action flow, and offload this merged flow to HW. During the merging, the 
match may be expanded with extra items, like for CT case, we will expand 5 
tuples into matchers.
4) Another aging thread is created to keep synchronization between HW and SW. 
It will periodically fetch stats from HW for offloaded flows, and contributes 
to megaflows' stats and Conntrack's stats. If some megaflow is dead, or the 
conntrack entry is aged out, the aging thread will also delete the merged rule 
from HW.

Using this solution, we keep the HW simple (meaning less coupling/dependency on 
HW), and maintain the flexibility in SW.

The solution is already massively deployed in our product environment, and 
works reliably.

We would like to get some feedback before we submit all patches.

Br,

Zhike Wang 
JDCloud, Product Development, IaaS   

Mobile／+86 13466719566
E- mail／wangzh...@jd.com
Address／5F Building A,North-Star Century Center,8 Beichen West Street,Chaoyang 
District Beijing
Https://JDCloud.com




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case multi-segments.

2019-11-10 Thread 王志克 via dev

Hi Darrell,

In TSO case, the packet may use multi-segments mbuf, and I do not think we need 
to make it linearal. In this case, we can NOT use pointer to calculate the tcp 
length.

Br,

Zhike Wang 
JDCloud, Product Development, IaaS   

Mobile／+86 13466719566
E- mail／wangzh...@jd.com
Address／5F Building A,North-Star Century Center,8 Beichen West Street,Chaoyang 
District Beijing
Https://JDCloud.com



From: Darrell Ball [mailto:dlu...@gmail.com] 
Sent: Saturday, November 09, 2019 8:12 AM
To: Zhike Wang
Cc: ovs dev; 王志克
Subject: Re: [ovs-dev] [PATCH] conntrack: Fix tcp payload length in case 
multi-segments.

Thanks for the patch 

Would you mind describing the use case that this patch is aiming to support ?

On Fri, Nov 8, 2019 at 1:23 AM Zhike Wang  wrote:
Signed-off-by: Zhike Wang 
---
 lib/conntrack-private.h | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index 590f139..1d21f6e 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -233,13 +233,17 @@ conn_update_expiration(struct conntrack *ct, struct conn 
*conn,
 static inline uint32_t
 tcp_payload_length(struct dp_packet *pkt)
 {
-    const char *tcp_payload = dp_packet_get_tcp_payload(pkt);
-    if (tcp_payload) {
-        return ((char *) dp_packet_tail(pkt) - dp_packet_l2_pad_size(pkt)
-                - tcp_payload);
-    } else {
-        return 0;
+    size_t l4_size = dp_packet_l4_size(pkt);
+
+    if (OVS_LIKELY(l4_size >= TCP_HEADER_LEN)) {
+        struct tcp_header *tcp = dp_packet_l4(pkt);
+        int tcp_len = TCP_OFFSET(tcp->tcp_ctl) * 4;
+
+        if (OVS_LIKELY(tcp_len >= TCP_HEADER_LEN && tcp_len <= l4_size)) {
+            return (l4_size - tcp_len);
+        }

Maybe I missed something, but it looks like the same calculation is arrived at.
 
     }
+    return 0;
 }

 #endif /* conntrack-private.h */
-- 
1.8.3.1


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [patch v2] conntrack: Add option to disable TCP sequence checking.

2019-06-10 Thread 王志克

Hi,

I would like to describe our scenario to use 'no-tcp-seq-chk'.

We JDCloud design conntrack hardware offloading as below:

1. SW maintains the conntrack state and timer. So packets that would impact 
conntrack state/timer should be sent to CPU, like TCP FIN/RST. In this way, we 
can clean up the connection ASAP.

2. HW has no idea about the conntrack, and just forwards the packet according 
to HW rule.

3. SW only inserts HW rule when the conntrack state is EST.

4. SW polls the HW periodically to know whether the HW rule is hit or not. If 
HW rule is idle for certain time, it will be deleted by SW.

In this design, most TCP data packet will go through the HW, so conntrack in SW 
will not update the seq. Finally the TCP FIN/RST are sent to CPU, so these 
should NOT be checked for seq.

Best regards,

Zhike Wang

Date: Mon, 10 Jun 2019 09:51:25 -0700

From: Ben Pfaff mailto:b...@ovn.org>>

To: Darrell Ball mailto:dlu...@gmail.com>>

Cc: d...@openvswitch.org

Subject: Re: [ovs-dev] [patch v2] conntrack: Add option to disable TCP

 sequence checking.

Message-ID: 
<20190610165125.gg28...@ovn.org>

Content-Type: text/plain; charset=us-ascii

On Sun, Jun 09, 2019 at 07:35:09AM -0700, Darrell Ball wrote:

> This may be needed in some special cases, such as to support some

> hardware offload implementations.

>

> Reported-at: 
> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359188.html

> Signed-off-by: Darrell Ball mailto:dlu...@gmail.com>>

> ---

>

> v2: Per particular requirement, support  'no-tcp-seq-chk' rather than

> 'liberal' mode.

>

> Add some debug counters.

I'm not sure whether an ovs-appctl command is the best way for users to

enable and disable this.  It means that it is difficult for an OpenFlow

controller to do it, since those commands aren't exposed via OpenFlow or

OVSDB.

The documentation says that sequence checking should only be disabled if

absolutely necessary.  If you have an example of such a case, it would

be helpful to add it to the documentation.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-discuss] crash when restart openvswitch with huge vxlan traffic running

2018-12-27 Thread 王志克

Thanks a lot.

I comfirm your fix works.

Br,

Zhike Wang 
JDCloud, Product Development, IaaS   

Mobile／+86 13466719566
E- mail／wangzh...@jd.com
Address／5F Building A,North-Star Century Center,8 Beichen West Street,Chaoyang 
District Beijing
Https://JDCloud.com




-Original Message-
From: Lorenzo Bianconi [mailto:lorenzo.bianc...@redhat.com] 
Sent: Friday, December 28, 2018 4:33 AM
To: Ben Pfaff
Cc: 王志克; Gregory Rose; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
Subject: Re: [ovs-discuss] crash when restart openvswitch with huge vxlan 
traffic running


> Greg, this is a kernel issue.  If you have the time, will you take a
> look at it sometime?
>

Hi all,

I worked on a pretty similar issue a couple of weeks ago. Could you
please take a look to the commit below (it is already in Linus's
tree):

commit 8e1da73acded4751a93d4166458a7e640f37d26c
Author: Lorenzo Bianconi 
Date:   Wed Dec 19 23:23:00 2018 +0100

gro_cell: add napi_disable in gro_cells_destroy

   Add napi_disable routine in gro_cells_destroy since starting from
   commit c42858eaf492 ("gro_cells: remove spinlock protecting receive
   queues") gro_cell_poll and gro_cells_destroy can run concurrently on
   napi_skbs list producing a kernel Oops if the tunnel interface is
   removed while gro_cell_poll is running. The following Oops has been
   triggered removing a vxlan device while the interface is receiving
   traffic

Regards,
Lorenzo

> On Thu, Dec 20, 2018 at 12:42:43PM +, 王志克 wrote:
> > Hi All,
> >
> > I did below test, and found system crash, does anyone knows whether there 
> > are already some fix for it?
> >
> > Setup:
> > CentOS7.4 3.10.0-693.el7.x86_64,
> > OVS: 2.10.1
> >
> > Step:
> > 1.  Build OVS only for userspace, and reuse kernel-builtin openvswitch 
> > module.
> > 2.  On Host1, create 1 vxlan interface and add 1 VF_rep to OVS.
> > 3.  Attach the VF to one VM, and the VM will do 5 tuples swap using DPDK 
> > app.
> > 4.  using traffic generator to send huge traffic （7Mpps with serveral k 
> > connetions）to Host1 PF.
> > 5.  The OVS rue are configured as below.
> >
> > VM1_PORTNAME=$1
> > VXLAN_PORTNAME=$2
> > VM1_PORT=$(ovs-vsctl list interface | grep $VM1_PORTNAME -A1 | grep ofport 
> > | sed 's/ofport *: \([0-9]*\)/\1/g')
> > VXLAN_PORT=$(ovs-vsctl list interface | grep $VXLAN_PORTNAME -A1 | grep 
> > ofport | sed 's/ofport *: \([0-9]*\)/\1/g')
> > ZONE=8
> > ovs-ofctl del-flows ovs-sriov
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=1000 table=0,arp, 
> > actions=NORMAL"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 
> > table=0,ip,in_port=$VM1_PORT,action=set_field:$VM1_PORT->reg6,goto_table:5"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 
> > table=0,ip,in_port=$VXLAN_PORT, tun_id=0x242, 
> > action=set_field:$VXLAN_PORT->reg6,set_field:$VM1_PORT->reg7,goto_table:5"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=5, priority=100, 
> > ip,actions=ct(table=10,zone=$ZONE)"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=-new+est-rel-inv+trk actions= goto_table:15"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=-new-est-rel+inv+trk actions=drop"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=-new-est-rel-inv-trk actions=drop"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=10, 
> > priority=100,ip,ct_state=+new-rel-inv+trk actions= 
> > ct(commit,table=15,zone=$ZONE)"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 table=15,ip, 
> > in_port=$VM1_PORT, 
> > action=set_field:0x242->tun_id,set_field:$VXLAN_PORT->reg7,goto_table:20"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "priority=100 table=15,ip, 
> > in_port=$VXLAN_PORT, actions=goto_table:20"
> >
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=20, priority=100, 
> > ip,action=output:NXM_NX_REG7[0..15]"
> > ovs-ofctl add-flow ovs-sriov -O openflow13 "table=200, 
> > priority=100,action=drop"
> > 6. execute serveral times “systemctl restart openvswitch”, then crash.
> >
> > Crash stack (2 kinds):
> > One
> > [  575.459905] device vxlan_sys_4789 left promiscuous mode
> > [  575.460103] BUG: unable to han

Re: [ovs-dev] [PATCH] memory: kill ovs-vswitchd under super

2018-02-04 Thread 王志克

Hi William/Ben,

I am working on how to reproduce it, but not yet have clue.

Some finding when the memory usage is large:

1) Usually the flow once reached the limit, example as below
#  ovs-appctl upcall/show
netdev@ovs-netdev:
flows : (current 100) (avg 104) (max 206205) (limit 199000)
dump duration : 1ms
ufid enabled : true

12: (keys 100)
2) There are attacks from VM port, like syn flood with different source TCP 
port.

3) There are lots of detailed flow in data path. Like including detailed TCP 
src and dst port, even I do not specify TCP info in my openflow rule. I can not 
understand this. Any idea?
ovs-appctl dpif/dump-flows br0

@Ben,
If there is no memory leak, and caused by memory fragment, I believe we can use 
dpdk memory pool instead alloc()/free() here and there.

Br,
Zhike Wang
-Original Message-
From: William Tu [mailto:u9012...@gmail.com] 
Sent: Saturday, February 03, 2018 1:46 AM
To: Ben Pfaff
Cc: 王志克; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev][PATCH] memory: kill ovs-vswitchd under super

Hi Zhike,

On Fri, Feb 2, 2018 at 7:48 AM, Ben Pfaff <b...@ovn.org> wrote:
> On Fri, Feb 02, 2018 at 12:37:58PM +, 王志克 wrote:
>> I also found that if once theres are lots of flows, the memory (RSS) usage 
>> of OVS process would be quite high, 2~3GB. Even then the flows disappear 
>> later, the memory still keeps.

Are you able to reproduce the issue?
There might be a memory leak around cls_rule, miniflow/minimatch allocation.
Can you share more information about your setup?

Thanks!
William

>>
>> I am not sure how many people notices this, but if indeed OVS has such 
>> defect, I guess this should be critical blocker.
>
> This is normal behavior of the C library malloc implementation.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] memory: kill ovs-vswitchd under super

2018-02-02 Thread 王志克

Hi,

I also found that if once theres are lots of flows, the memory (RSS) usage of 
OVS process would be quite high, 2~3GB. Even then the flows disappear later, 
the memory still keeps.

I am not sure how many people notices this, but if indeed OVS has such defect, 
I guess this should be critical blocker.

Would like to get more info whether others meet such issue, and any clue if 
possible.

Br,
Wang Zhike

--

Message: 1
Date: Sat, 18 Nov 2017 10:07:34 -0800
From: Ben Pfaff 
To: ovs-dev@openvswitch.org,William Tu 
Subject: Re: [ovs-dev] [PATCH] memory: kill ovs-vswitchd under super
highmemory  usage.
Message-ID: 
Content-Type: text/plain; charset=utf-8

On November 16, 2017 10:49:16 PM PST, William Tu  wrote:
>When deploying OVS on a large scale testbed, we occationally see OVS
>gets killed by the oom (out-of-memory) killer, after installing 100k
>rules and seeing ovs-vswitchd consumes more than 4GB of memory.
>Unfortunately, there is no better way to debug and root cause the
>memory
>leak.  The patch tries to add heuristic about the memory consumption
>of numbers of rules and the memory usage (typically 1-2 kB per rule)
>and set an upper bound for the memory usage of ovs-vswitchd.  If the
>memory usage, rss (resident set size), is larger than x16 num_rules,
>we kill the ovs-vswitchd with SIGSEGV, hoping to generate coredump
>file to help debugging.
>
>Signed-off-by: William Tu 
>---
> lib/memory.c | 26 ++
> 1 file changed, 26 insertions(+)
>
>diff --git a/lib/memory.c b/lib/memory.c
>index da97476c6a45..75cce6e5dcc3 100644
>--- a/lib/memory.c
>+++ b/lib/memory.c
>@@ -25,6 +25,7 @@
> #include "timeval.h"
> #include "unixctl.h"
> #include "openvswitch/vlog.h"
>+#include 
> 
> VLOG_DEFINE_THIS_MODULE(memory);
> 
>@@ -110,6 +111,27 @@ memory_should_report(void)
> }
> 
> static void
>+check_memory_usage(unsigned int num_rules)
>+{
>+struct rusage usage;
>+unsigned long int rss;
>+
>+getrusage(RUSAGE_SELF, );
>+rss = (unsigned long int) usage.ru_maxrss; /* in kilobytes */
>+
>+/* Typically a rule takes about 1-2 kilobytes of memory.  If the
>rss
>+ * (resident set size) is larger than 1GB and x16 of num_rules, we
>+ * might have a memory leak.  Thus, kill it with SIGSEGV to
>generate a
>+ * coredump.
>+ */
>+if (rss > 1024 * 1024 && rss > num_rules * 16) {
>+VLOG_ERR("Unexpected high memory usage of %lu kB,"
>+ " rules %u killed with SIGSEGV", rss, num_rules);
>+raise(SIGSEGV);
>+}
>+}
>+
>+static void
> compose_report(const struct simap *usage, struct ds *s)
> {
> const struct simap_node **nodes = simap_sort(usage);
>@@ -120,6 +142,10 @@ compose_report(const struct simap *usage, struct
>ds *s)
> const struct simap_node *node = nodes[i];
> 
> ds_put_format(s, "%s:%u ", node->name, node->data);
>+
>+if (!strcmp(node->name, "rules")) {
>+check_memory_usage(node->data);
>+  }
> }
> ds_chomp(s, ' ');
> free(nodes);
>-- 
>2.7.4
>
>___
>dev mailing list
>d...@openvswitch.org
>https://mail.openvswitch.org/mailman/listinfo/ovs-dev

I know I suggested this but I didn't mean it as something that we'd carry in 
the tree but only as a temporary patch while we're trying to track down the 
leak.

--
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Question about Rx mergeable buffer

2018-01-23 Thread 王志克

Hi,

I have question about RX merge feature.
Below mentions that set mrg_rxbuf=off can improve performance.  So question 1:
How much would it be affected for throughput?

*
Rx Mergeable 
Buffers¶

Rx mergeable buffers is a virtio feature that allows chaining of multiple 
virtio descriptors to handle large packet sizes. Large packets are handled by 
reserving and chaining multiple free descriptors together. Mergeable buffer 
support is negotiated between the virtio driver and virtio device and is 
supported by the DPDK vhost library. This behavior is supported and enabled by 
default, however in the case where the user knows that rx mergeable buffers are 
not needed i.e. jumbo frames are not needed, it can be forced off by adding 
mrg_rxbuf=off to the QEMU command line options. By not reserving multiple 
chains of descriptors it will make more individual virtio descriptors available 
for rx to the guest using dpdkvhost ports and this can improve performance.
***

http://docs.openvswitch.org/en/latest/howto/dpdk/?highlight=mrg_rxbuf

It mentions mrg_rxbuf must be set to on in order to support Jumbo frames.

So question 2: I am wondering if I set mtu to 2000, should mrg_rxbuf be also a 
MUST? Is there a threshold to switch the configuration?


Some additional configuration is needed to take advantage of jumbo frames with 
vHost ports:
mergeable buffers must be enabled for vHost ports, as demonstrated in the QEMU 
command line snippet below:”.
**

BR,
Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] OVS+DPDK: deadlock and race condtion, which leads OVS deadlock or crash

2018-01-23 Thread 王志克

om: Yuanhan Liu [mailto:y...@fridaylinux.org] 
Sent: Thursday, January 18, 2018 10:04 PM
To: 王志克
Cc: d...@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of 
conn_mutex

Hi,

Apologize for late review.

On Tue, Jan 02, 2018 at 02:08:36AM -0800, zhike wang wrote:
> From: wang zhike <wangzh...@jd.com>
> 
> v3:
> * Fix duplicate variable name, which leads to unexpected memory write.
> v2:
> * Move fdset_del before conn destroy.
> * Fix coding style.

Note that we prefer to put the change logs after "---" below Signed-off-by,
so that those change logs won't be tracked in the git log history.

> This patch fixes below race condition:
> 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex
>->fdset_del->loop to check fd.busy.
> 2. another thread calls fdset_event_dispatch, and the busy flag is
>changed AFTER handling on the fd, i.e, rcb(). However, the rcb,
>such as vhost_user_read_cb() would try to retrieve the conn_mutex.
> 
> So issue is that the 1st thread will loop check the flag while holding
> the mutex, while the 2nd thread would be blocked by mutex and can not
> change the flag. Then dead lock is observed.

I then would change the title to "vhost: fix deadlock".

I'm also keen to know how do you reproduce this issue with real-life
APP (say ovs) and how easy it is for reproduce.

> Signed-off-by: zhike wang <wangzh...@jd.com>

Again, you need fix your git config file about your name.

> ---
>  lib/librte_vhost/socket.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c
> index 422da00..ea01327 100644
> --- a/lib/librte_vhost/socket.c
> +++ b/lib/librte_vhost/socket.c
> @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list {
>   struct vhost_user_socket *vsocket = vhost_user.vsockets[i];
>  
>   if (!strcmp(vsocket->path, path)) {
> + int del_fds[MAX_FDS];
> + int num_of_fds = 0, fd_index;
> +

I think the naming could be a bit shorter, like "fds, nr_fds (or nb_fds),
fd_idx".

>   if (vsocket->is_server) {
>   fdset_del(_user.fdset, 
> vsocket->socket_fd);
>   close(vsocket->socket_fd);
> @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list {
>   vhost_user_remove_reconnect(vsocket);
>   }
>  
> + /* fdset_del() must be called without conn_mutex. */
> + pthread_mutex_lock(>conn_mutex);
> + for (conn = TAILQ_FIRST(>conn_list);
> +  conn != NULL;
> +  conn = next) {
> + next = TAILQ_NEXT(conn, next);
> +
> + del_fds[num_of_fds++] = conn->connfd;
> + }
> + pthread_mutex_unlock(>conn_mutex);
> +
> + for (fd_index = 0; fd_index < num_of_fds; fd_index++)
> + fdset_del(_user.fdset, del_fds[fd_index]);
> +
>   pthread_mutex_lock(>conn_mutex);
>   for (conn = TAILQ_FIRST(>conn_list);
>conn != NULL;
>conn = next) {
>   next = TAILQ_NEXT(conn, next);
>  
> - fdset_del(_user.fdset, conn->connfd);

If you log the fd here and invoke fdset_del() and close() after the loop,
you then could avoid one extra loop as you did above.

--yliu
>   RTE_LOG(INFO, VHOST_CONFIG,
>   "free connfd = %d for device '%s'\n",
>   conn->connfd, path);
> -- 
> 1.8.3.1
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Userspace space conntrack tcp issue

2018-01-01 Thread 王志克

Hi,

I am testing below scenario, and I think there is some issue on TCP conntrack 
sequence number filter.

Scenario:

VM1->Host1-Host2-->VM2

There is SCP file copy below VM1 and VM2, and we configured conntrack. During 
the scp, I restart the openvswitch service (process stop and start), then after 
the restart, I saw the consequence TCP packets are tagged as invalid by 
conntrack and traffic can not be recovered.
I did some debug and found it fails on below check “(ackskew >= 
-MAXACKWINDOW)”. I am wondering should it be “(ackskew >= 
-(MAXACKWINDOW<seqlo - ack;
#define MAXACKWINDOW (0x + 1500)/* 1500 is an arbitrary fudge factor */
if (SEQ_GEQ(src->seqhi, end)
/* Last octet inside other's window space */
&& SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
/* Retrans: not more than one window back */
&& (ackskew >= -MAXACKWINDOW)
/* Acking not more than one reassembled fragment backwards */
&& (ackskew <= (MAXACKWINDOW << sws))
/* Acking not more than one window forward */
&& ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {

Details:

   TCP Client Seq   TCP Client ACKTCP Server Seq
 TCP Server ACK
Before the restart:0x69f1536e 0xa3c81999   0xa3ca2d49   
0x69f15302
After the restart(5s later): 0x69f15302 0xa3c81999   0xa3c561e1 
  0x69f15302

As we can see the new seq 0xa3c561e1 (server steped back since previous 
segments are not acked.) is much less than 0xa3c81999 (client keeps sending 
last acked packet), which leads to failed check on conntrack.

I am using OVS2.7.0+dpdk16.11.3

Any thought?

Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] lib/conntrack: remove unnecessary addr check for ICMP.

2017-12-06 Thread 王志克

Hi Darrell,

In fact I indeed considerred whether to keep partial check as your patch. My 
idea is that it is better to do as little work as possible. So in my opnion 
such validation (and even checksum validation and so on) is not necessary at 
all (OVS is middle device, and such error validation can be done at end host 
stack).

I am open to the decision. So if you think your patch is more suitable, I can 
be the co-author.

Br,
Wang Zhike


-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Thursday, December 07, 2017 10:14 AM
To: 王志克; d...@openvswitch.org; Daniele Di Proietto
Subject: Re: [ovs-dev] [PATCH] lib/conntrack: remove unnecessary addr check for 
ICMP.

Hi Wang

To speed up the process, I sent an alternative patch here:

https://patchwork.ozlabs.org/patch/845407/

I agree the address sanity check is not correct but I think it should be 
partially retained
rather than removed. I also think a test was needed.

Pls let me know if it makes sense.
Also. if you prefer, I can make you the author.

Thanks Darrell




On 12/6/17, 11:22 AM, "Darrell Ball" <db...@vmware.com> wrote:

Thanks for looking at this.

In the commit message, can you delineate.

1/ The forward direction packet in terms of src ip, dest ip, L4 attributes
2/ The reverse direction error packet in terms of src ip, dest ip, icmp 
error payload

Darrell

On 12/4/17, 10:22 PM, "ovs-dev-boun...@openvswitch.org on behalf of wang 
zhike" <ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:

From: wangzhike <wangzh...@jd.com>

ICMP response (Unreachable/fragmentationRequired/...) may be created
at devices in the middle, and such packets are tagged as invalid in
user space conntrack. In fact it does not make sense to validate the
src and dest address.

Signed-off-by: wang zhike <wangzh...@jd.com>
---
 lib/conntrack.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/lib/conntrack.c b/lib/conntrack.c
index f5a3aa9..c44ad0f 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1702,11 +1702,6 @@ extract_l4_icmp(struct conn_key *key, const void 
*data, size_t size,
 return false;
 }
 
-if (inner_key.src.addr.ipv4_aligned != 
key->dst.addr.ipv4_aligned
-|| inner_key.dst.addr.ipv4_aligned != 
key->src.addr.ipv4_aligned) {
-return false;
-}
-
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
@@ -1789,14 +1784,6 @@ extract_l4_icmp6(struct conn_key *key, const 
void *data, size_t size,
 return false;
 }
 
-/* pf doesn't do this, but it seems a good idea */
-if (!ipv6_addr_equals(_key.src.addr.ipv6_aligned,
-  >dst.addr.ipv6_aligned)
-|| !ipv6_addr_equals(_key.dst.addr.ipv6_aligned,
- >src.addr.ipv6_aligned)) {
-return false;
-}
-
 key->src = inner_key.src;
 key->dst = inner_key.dst;
 key->nw_proto = inner_key.nw_proto;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=IqOQ4UcLnOpQUkvP7FHVH6IWeAK_4DDBRBqX2w3wl94=Xmvgnl8plChpjcr77sLIOxE0krKVuNVqvS3_eOoNeMg=




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] How to set QoS for VM egress traffic on tunnel mode

2017-11-22 Thread 王志克

Hi,

The topo can be same as below example.
http://docs.openvswitch.org/en/latest/howto/tunneling/?highlight=tunnel

I just wonder in such configuration how the egress shaping can be configured 
for different VM. Or the QoS does not work for tunnel case?

Appreciate help.

BR,
Wang Zhike

From: 王志克
Sent: Wednesday, November 22, 2017 5:51 PM
To: ovs-dev@openvswitch.org; ovs-disc...@openvswitch.org
Subject: How to set QoS for VM egress traffic on tunnel mode

Hi All,

I want to set QoS with guide from below link “egress traffic shaping”, but do 
not know how for tunnel mode.
http://docs.openvswitch.org/en/latest/faq/qos/

My scenario:

I have several VM ports, and several VxLan ports in br0, and there is one 
seprate eth0 port (not in br0), which is the underlay port of Vxlan port.
Currently I add rule to match VM traffic to certain Vxlan port, and all these 
vxlan ports would go out through eth0.

Now I want to enable egress traffic shaping, eg:
VM1 goes out with min_rate=10M, max_rate=100M.
VM2 goes out with min_rate=50M, max_rate=200M.

In the given example in http://docs.openvswitch.org/en/latest/faq/qos/ chapter 
“egress traffic shaping”, it directly uses the physical port. But in tunnel 
case, I can NOT specify the physical port directly.
So how to configrue egress traffic shaping in tunnel case? Appreciate some 
example configruation. Thanks.


Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] How to set QoS for VM egress traffic on tunnel mode

2017-11-22 Thread 王志克

Hi All,

I want to set QoS with guide from below link “egress traffic shaping”, but do 
not know how for tunnel mode.
http://docs.openvswitch.org/en/latest/faq/qos/

My scenario:

I have several VM ports, and several VxLan ports in br0, and there is one 
seprate eth0 port (not in br0), which is the underlay port of Vxlan port.
Currently I add rule to match VM traffic to certain Vxlan port, and all these 
vxlan ports would go out through eth0.

Now I want to enable egress traffic shaping, eg:
VM1 goes out with min_rate=10M, max_rate=100M.
VM2 goes out with min_rate=50M, max_rate=200M.

In the given example in http://docs.openvswitch.org/en/latest/faq/qos/ chapter 
“egress traffic shaping”, it directly uses the physical port. But in tunnel 
case, I can NOT specify the physical port directly.
So how to configrue egress traffic shaping in tunnel case? Appreciate some 
example configruation. Thanks.


Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] OVS+DPDK vhost-user-client port hang

2017-11-01 Thread 王志克

Hi,
I met one issue with qemu2.8.1.1+ovs2.7.0+dpdk16.11.0. The issue is that one 
windows 2008 VM can NOT send/receive packet anymore. I even can not 
re-initialize the virtio adapter in VM (no reponse). I can NOT reproduce it.
>From OVS+DPDK stats, there is no packet from the vhost-user-client port. The 
>port is Up using ovs-ofctl show.
I observed below log in qemu.

2017-11-02T03:48:48.265767Z qemu-kvm: virtio_ioport_write: unexpected address 
0x13 value 0x0
2017-11-02T03:48:50.265766Z qemu-kvm: virtio_ioport_write: unexpected address 
0x13 value 0x0
2017-11-02T03:48:52.265764Z qemu-kvm: virtio_ioport_write: unexpected address 
0x13 value 0x0
2017-11-02T03:48:54.265750Z qemu-kvm: virtio_ioport_write: unexpected address 
0x13 value 0x0
2017-11-02T03:48:55.452554Z qemu-kvm: Failed to set msg fds.
2017-11-02T03:48:55.452591Z qemu-kvm: vhost VQ 0 ring restore failed: -1: 
Resource temporarily unavailable (11)
2017-11-02T03:48:55.452601Z qemu-kvm: Failed to set msg fds.
2017-11-02T03:48:55.452606Z qemu-kvm: vhost VQ 1 ring restore failed: -1: 
Resource temporarily unavailable (11)
2017-11-02T03:49:10.040044Z qemu-kvm: terminating on signal 15 from pid 17716 
(/usr/sbin/libvirtd)

Does some one met similar issue? Or any idea how to solve it? Thanks

Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] CPU freqency is not max when running ovs_DPDK

2017-09-22 Thread 王志克

Hi All,

I am using OVS_DPDK, and the target CPU has running 100%. However, I notice the 
cpu frequency is NOT exceeding the max value, so the performance may not reach 
the best value. I have 2 pmd only for now, each is hyper-thread core in one 
physical core.

I am not sure the reason, and do not get idea after going through the BIOS and 
googles. Can someone kindly indicate the root cause? I am using centos7.  
Appreciate your help.

# cat /sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_cur_freq
2599980   -àI expect it 
would be 300
# cat /sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_max_freq
300
# lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):64
On-line CPU(s) list:   0-63
Thread(s) per core:2
Core(s) per socket:16
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 79
Model name:Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
Stepping:  1
CPU MHz:   1199.953
BogoMIPS:  4195.88
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  40960K
NUMA node0 CPU(s): 0-15,32-47
NUMA node1 CPU(s): 16-31,48-63


Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] ovs_dpdk: dpdk-socket-mem usage question

2017-09-20 Thread 王志克

Thanks Billy.

I will tune it during my test while trying to reading related code to 
understand the logic.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Tuesday, September 19, 2017 9:07 PM
To: 王志克; ovs-dev@openvswitch.org; ovs-disc...@openvswitch.org
Subject: RE: ovs_dpdk: dpdk-socket-mem usage question

Hi Wang,

Typically I reserve between 512M and 1G on each Numa.

There is no formula I am aware of for how much memory is actually required.

Fundamentally this will be determined by the maximum number and size of packets 
in-flight at any given time. Which is determined by the ingress packet rate, 
processing time in ovs and the rate and frequency at which egress queues are 
drained.

The maximum memory requiremnt is determined by the number of rx and tx queues 
and how many descriptors each has.  Also longer queues (more descriptors) will 
protect against packet loss up to a point. So QoS/throughput also comes in to 
play. 

On that point dpdkvhostuser ports, as far as I know, current versions of qemu 
have a virtio queue length fixed at compile time so these queue lengths cannot 
be modified by OVS at all.

In short I don't think there is any way other than testing and tuning of the 
dpdk application (in this case OVS) and the particular use case while 
monitoring internal queue usage. This should give you an idea of an acceptable 
maximum length for the various queues and a good first guess as to the total 
amount of memory required.

Regards,
Billy.



> -Original Message-
> From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-
> boun...@openvswitch.org] On Behalf Of ???
> Sent: Wednesday, September 13, 2017 6:35 AM
> To: ovs-dev@openvswitch.org; ovs-disc...@openvswitch.org
> Subject: [ovs-dev] ovs_dpdk: dpdk-socket-mem usage question
> 
> Hi All,
> 
> I read below doc, and have one question:
> 
> http://docs.openvswitch.org/en/latest/intro/install/dpdk/
> dpdk-socket-mem
> Comma separated list of memory to pre-allocate from hugepages on specific
> sockets.
> 
> Question:
>OVS+DPDK can let user to specify the needed memory using dpdk-socket-
> mem. But the question is that how to know how much memory is needed. Is
> there some algorithm on how to calculate the memory?Thanks.
> 
> Br,
> Wang Zhike
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] How to define a feature macro in OVS

2017-09-14 Thread 王志克

Hi All,

I want to submit one feature, which by default should be disabled. I plan to 
define one compiling macro, but I do not know how.

Can someone guide me how to add such macro in OVS? Thanks.

Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] ovs_dpdk: dpdk-socket-mem usage question

2017-09-12 Thread 王志克

Hi All,

I read below doc, and have one question:

http://docs.openvswitch.org/en/latest/intro/install/dpdk/
dpdk-socket-mem
Comma separated list of memory to pre-allocate from hugepages on specific 
sockets.

Question:
   OVS+DPDK can let user to specify the needed memory using dpdk-socket-mem. 
But the question is that how to know how much memory is needed. Is there some 
algorithm on how to calculate the memory?Thanks.

Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-10 Thread 王志克

Hi Jan,

Do you have some test data about the cross-NUMA impact?

Thanks.

Br,
Wang Zhike

-Original Message-
From: Jan Scheurich [mailto:jan.scheur...@ericsson.com] 
Sent: Wednesday, September 06, 2017 9:33 PM
To: O Mahony, Billy; 王志克; Darrell Ball; ovs-disc...@openvswitch.org; 
ovs-dev@openvswitch.org; Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Billy,

> You are going to have to take the hit crossing the NUMA boundary at some 
> point if your NIC and VM are on different NUMAs.
> 
> So are you saying that it is more expensive to cross the NUMA boundary from 
> the pmd to the VM that to cross it from the NIC to the
> PMD?

Indeed, that is the case: If the NIC crosses the QPI bus when storing packets 
in the remote NUMA there is no cost involved for the PMD. (The QPI bandwidth is 
typically not a bottleneck.) The PMD only performs local memory access.

On the other hand, if the PMD crosses the QPI when copying packets into a 
remote VM, there is a huge latency penalty involved, consuming lots of PMD 
cycles that cannot be spent on processing packets. We at Ericsson have observed 
exactly this behavior.

This latency penalty becomes even worse when the LLC cache hit rate is degraded 
due to LLC cache contention with real VNFs and/or unfavorable packet buffer 
re-use patterns as exhibited by real VNFs compared to typical synthetic 
benchmark apps like DPDK testpmd.

> 
> If so then in that case you'd like to have two (for example) PMDs polling 2 
> queues on the same NIC. With the PMDs on each of the
> NUMA nodes forwarding to the VMs local to that NUMA?
> 
> Of course your NIC would then also need to be able know which VM (or at least 
> which NUMA the VM is on) in order to send the frame
> to the correct rxq.

That would indeed be optimal but hard to realize in the general case (e.g. with 
VXLAN encapsulation) as the actual destination is only known after tunnel pop. 
Here perhaps some probabilistic steering of RSS hash values based on measured 
distribution of final destinations might help in the future.

But even without that in place, we need PMDs on both NUMAs anyhow (for 
NUMA-aware polling of vhostuser ports), so why not use them to also poll remote 
eth ports. We can achieve better average performance with fewer PMDs than with 
the current limitation to NUMA-local polling.

BR, Jan

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-10 Thread 王志克

Hi Billy,

In my test, almost all traffic went trough via EMC. So the fix does not impact 
the result, especially we want to know the difference (not the exact num).

Can you test to get some data? Thanks.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Friday, September 08, 2017 11:18 PM
To: 王志克; ovs-dev@openvswitch.org; Jan Scheurich; Darrell Ball; 
ovs-disc...@openvswitch.org; Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337309.html

I see it's been acked and is due to be pushed to master with other changes on 
the dpdk merge branch so you'll have to apply it manually for now.

/Billy. 

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Friday, September 8, 2017 11:48 AM
> To: ovs-dev@openvswitch.org; Jan Scheurich
> <jan.scheur...@ericsson.com>; O Mahony, Billy
> <billy.o.mah...@intel.com>; Darrell Ball <db...@vmware.com>; ovs-
> disc...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> I used ovs2.7.0. I searched the git log, and not sure which commit it is. Do 
> you
> happen to know?
> 
> Yes, I cleared the stats after traffic run.
> 
> Br,
> Wang Zhike
> 
> 
> From: "O Mahony, Billy" <billy.o.mah...@intel.com>
> To: "wangzh...@jd.com" <wangzh...@jd.com>, Jan Scheurich
>   <jan.scheur...@ericsson.com>, Darrell Ball <db...@vmware.com>,
>   "ovs-disc...@openvswitch.org" <ovs-disc...@openvswitch.org>,
>   "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>, Kevin
> Traynor
>   <ktray...@redhat.com>
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
>   physical port
> Message-ID:
>   <03135aea779d444e90975c2703f148dc58c19...@irsmsx107.ger.c
> orp.intel.com>
> 
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Wang,
> 
> Thanks for the figures. Unexpected results as you say. Two things come to
> mind:
> 
> I?m not sure what code you are using but the cycles per packet statistic was
> broken for a while recently. Ilya posted a patch to fix it so make sure you
> have that patch included.
> 
> Also remember to reset the pmd stats after you start your traffic and then
> measure after a short duration.
> 
> Regards,
> Billy.
> 
> 
> 
> From: ??? [mailto:wangzh...@jd.com]
> Sent: Friday, September 8, 2017 8:01 AM
> To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy
> <billy.o.mah...@intel.com>; Darrell Ball <db...@vmware.com>; ovs-
> disc...@openvswitch.org; ovs-dev@openvswitch.org; Kevin Traynor
> <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> 
> Hi All,
> 
> 
> 
> I tested below cases, and get some performance data. The data shows there
> is little impact for cross NUMA communication, which is different from my
> expectation. (Previously I mentioned that cross NUMA would add 60%
> cycles, but I can NOT reproduce it any more).
> 
> 
> 
> @Jan,
> 
> You mentioned cross NUMA communication would cost lots more cycles. Can
> you share your data? I am not sure whether I made some mistake or not.
> 
> 
> 
> @All,
> 
> Welcome your data if you have data for similar cases. Thanks.
> 
> 
> 
> Case1: VM0->PMD0->NIC0
> 
> Case2:VM1->PMD1->NIC0
> 
> Case3:VM1->PMD0->NIC0
> 
> Case4:NIC0->PMD0->VM0
> 
> Case5:NIC0->PMD1->VM1
> 
> Case6:NIC0->PMD0->VM1
> 
> 
> 
> ? VM Tx Mpps  Host Tx Mpps  avg cycles per packet   avg processing
> cycles per packet
> 
> Case1 1.4   1.4 512 
> 415
> 
> Case2 1.3   1.3 537 
> 436
> 
> Case3 1.351.35   514 390
> 
> 
> 
> ?  VM Rx MppsHost Rx Mpps  avg cycles per packet   avg processing 
> cycles
> per packet
> 
> Case4 1.3   1.3 549 
> 533
> 
> Case5 1.3   1.3 559 
> 540
> 
> Case6 1.28 1.28  568 551
> 
> 
> 
> Br,
> 
> Wang Zhike
> 
> 
> 
> -Original Message-
> From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
> Sent: Wednesd

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-08 Thread 王志克

Hi All,



I tested below cases, and get some performance data. The data shows there is 
little impact for cross NUMA communication, which is different from my 
expectation. (Previously I mentioned that cross NUMA would add 60% cycles, but 
I can NOT reproduce it any more).



@Jan,

You mentioned cross NUMA communication would cost lots more cycles. Can you 
share your data? I am not sure whether I made some mistake or not.



@All,

Welcome your data if you have data for similar cases. Thanks.



Case1: VM0->PMD0->NIC0

Case2:VM1->PMD1->NIC0

Case3:VM1->PMD0->NIC0

Case4:NIC0->PMD0->VM0

Case5:NIC0->PMD1->VM1

Case6:NIC0->PMD0->VM1



　VM Tx Mpps   Host Tx Mpps  avg cycles per packet   avg 
processing cycles per packet

Case1   1.4   1.4   512  415

Case2   1.3   1.3   537  436

Case3   1.351.35  514  390



　 VM Rx MppsHost Rx Mpps  avg cycles per packet   avg processing 
cycles per packet

Case4   1.3   1.3   549  533

Case5   1.3   1.3   559  540

Case6   1.28 1.28 568  551



Br,

Wang Zhike



-Original Message-
From: Jan Scheurich [mailto:jan.scheur...@ericsson.com]
Sent: Wednesday, September 06, 2017 9:33 PM
To: O Mahony, Billy; 王志克; Darrell Ball; 
ovs-disc...@openvswitch.org<mailto:ovs-disc...@openvswitch.org>; 
ovs-dev@openvswitch.org<mailto:ovs-dev@openvswitch.org>; Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port



Hi Billy,



> You are going to have to take the hit crossing the NUMA boundary at some 
> point if your NIC and VM are on different NUMAs.

>

> So are you saying that it is more expensive to cross the NUMA boundary from 
> the pmd to the VM that to cross it from the NIC to the

> PMD?



Indeed, that is the case: If the NIC crosses the QPI bus when storing packets 
in the remote NUMA there is no cost involved for the PMD. (The QPI bandwidth is 
typically not a bottleneck.) The PMD only performs local memory access.



On the other hand, if the PMD crosses the QPI when copying packets into a 
remote VM, there is a huge latency penalty involved, consuming lots of PMD 
cycles that cannot be spent on processing packets. We at Ericsson have observed 
exactly this behavior.



This latency penalty becomes even worse when the LLC cache hit rate is degraded 
due to LLC cache contention with real VNFs and/or unfavorable packet buffer 
re-use patterns as exhibited by real VNFs compared to typical synthetic 
benchmark apps like DPDK testpmd.



>

> If so then in that case you'd like to have two (for example) PMDs polling 2 
> queues on the same NIC. With the PMDs on each of the

> NUMA nodes forwarding to the VMs local to that NUMA?

>

> Of course your NIC would then also need to be able know which VM (or at least 
> which NUMA the VM is on) in order to send the frame

> to the correct rxq.



That would indeed be optimal but hard to realize in the general case (e.g. with 
VXLAN encapsulation) as the actual destination is only known after tunnel pop. 
Here perhaps some probabilistic steering of RSS hash values based on measured 
distribution of final destinations might help in the future.



But even without that in place, we need PMDs on both NUMAs anyhow (for 
NUMA-aware polling of vhostuser ports), so why not use them to also poll remote 
eth ports. We can achieve better average performance with fewer PMDs than with 
the current limitation to NUMA-local polling.



BR, Jan


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] ovs+dpdk: no way to set pmd-rxq-affinity for multiple queues to 2 pmd

2017-09-07 Thread 王志克

Thanks Darrell.

It indeed works.

Br,
Wang Zhike

-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Friday, September 08, 2017 12:34 AM
To: 王志克; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] ovs+dpdk: no way to set pmd-rxq-affinity for multiple 
queues to 2 pmd

Here it is again after fixing the line breaks.
 

On 9/7/17, 9:30 AM, "Darrell Ball" <db...@vmware.com> wrote:

Hi Lawrence

I think you wanted to set the rxq affinity in a single command.
Here is a simplified version for illustration.

darrell@prmh-nsx-perf-server125:~/ovs/ovs_master$ sudo ovs-appctl 
dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 0:
isolated : false
port: dpdk0 queue-id: 0
pmd thread numa_id 1 core_id 1:
isolated : false
pmd thread numa_id 0 core_id 2:
isolated : false
port: dpdk1 queue-id: 0
pmd thread numa_id 1 core_id 3:
isolated : false

darrell@prmh-nsx-perf-server125:~/ovs/ovs_master$ sudo ovs-vsctl set 
interface dpdk0 other_config:pmd-rxq-affinity="0:0,1:2,2:0,3:2,4:0,5:2,6:0,7:2"

darrell@prmh-nsx-perf-server125:~/ovs/ovs_master$ sudo ovs-appctl 
dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 0:
isolated : true
port: dpdk0 queue-id: 0 2 4 6
pmd thread numa_id 1 core_id 1:
isolated : false
port: dpdk1 queue-id: 0
pmd thread numa_id 0 core_id 2:
isolated : true
port: dpdk0 queue-id: 1 3 5 7
pmd thread numa_id 1 core_id 3:
isolated : false

Thanks Darrell



On 9/7/17, 5:41 AM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:

Hi,

Please see below log:

It seems no way to set pmd-rxq-affinity for multiple queues to 2 pmd. I 
think it is a bug.

[root@A01-R08-I24-169 wangzhike]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 4:
 isolated : false
 port: dpdk0  queue-id: 3 7
 port: port-ve54hf69ys  queue-id: 0
pmd thread numa_id 0 core_id 36:
 isolated : false
 port: dpdk0  queue-id: 1 5
pmd thread numa_id 1 core_id 20:
 isolated : true
 port: dpdk0  queue-id: 0 2 4 6
pmd thread numa_id 1 core_id 52:
 isolated : false
 port: port-6s9isqsttp   queue-id: 0
 port: port-b1ri1292y7  queue-id: 0
 port: port-l8n2dvgyijqueue-id: 0
[root@A01-R08-I24-169 wangzhike]# ovs-vsctl set interface dpdk0 
other_config:pmd-rxq-affinity="1:52,3:52,5:52,7:52"
[root@A01-R08-I24-169 wangzhike]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 4:
 isolated : false
 port: dpdk0  queue-id: 2 6
 port: port-ve54hf69ys  queue-id: 0
pmd thread numa_id 0 core_id 36:
 isolated : false
 port: dpdk0  queue-id: 0 4
pmd thread numa_id 1 core_id 20:
 isolated : false
 port: port-6s9isqsttp   queue-id: 0
 port: port-b1ri1292y7  queue-id: 0
 port: port-l8n2dvgyijqueue-id: 0
pmd thread numa_id 1 core_id 52:
 isolated : true
 port: dpdk0  queue-id: 1 3 5 7
[root@A01-R08-I24-169 wangzhike]# ovs-vsctl set interface dpdk0 
other_config:pmd-rxq-affinity="0:20,2:20,4:20,6:20"
[root@A01-R08-I24-169 wangzhike]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 4:
 isolated : false
 port: dpdk0  queue-id: 3 7
 port: port-ve54hf69ys  queue-id: 0
pmd thread numa_id 0 core_id 36:
 isolated : false
 port: dpdk0  queue-id: 1 5
pmd thread numa_id 1 core_id 20:
 isolated : true
 port: dpdk0  queue-id: 0 2 4 6
pmd thread numa_id 1 core_id 52:
 isolated : false
 port: port-6s9isqsttp   queue-id: 0
 port: port-b1ri1292y7  queue-id: 0
 port: port-l8n2dvgyijqueue-id: 0

Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=NDgM-S7tXaAGnfHlWlrxKYETL_P3vsrVNEPjwhbpo6k=WMde1LMSDMEWXSC3oYomkW1efTaHpoJjcYyEFj1DCyY=
 




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] ovs+dpdk: no way to set pmd-rxq-affinity for multiple queues to 2 pmd

2017-09-07 Thread 王志克

Hi,

Please see below log:

It seems no way to set pmd-rxq-affinity for multiple queues to 2 pmd. I think 
it is a bug.

[root@A01-R08-I24-169 wangzhike]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 4:
 isolated : false
 port: dpdk0  queue-id: 3 7
 port: port-ve54hf69ys  queue-id: 0
pmd thread numa_id 0 core_id 36:
 isolated : false
 port: dpdk0  queue-id: 1 5
pmd thread numa_id 1 core_id 20:
 isolated : true
 port: dpdk0  queue-id: 0 2 4 6
pmd thread numa_id 1 core_id 52:
 isolated : false
 port: port-6s9isqsttp   queue-id: 0
 port: port-b1ri1292y7  queue-id: 0
 port: port-l8n2dvgyijqueue-id: 0
[root@A01-R08-I24-169 wangzhike]# ovs-vsctl set interface dpdk0 
other_config:pmd-rxq-affinity="1:52,3:52,5:52,7:52"
[root@A01-R08-I24-169 wangzhike]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 4:
 isolated : false
 port: dpdk0  queue-id: 2 6
 port: port-ve54hf69ys  queue-id: 0
pmd thread numa_id 0 core_id 36:
 isolated : false
 port: dpdk0  queue-id: 0 4
pmd thread numa_id 1 core_id 20:
 isolated : false
 port: port-6s9isqsttp   queue-id: 0
 port: port-b1ri1292y7  queue-id: 0
 port: port-l8n2dvgyijqueue-id: 0
pmd thread numa_id 1 core_id 52:
 isolated : true
 port: dpdk0  queue-id: 1 3 5 7
[root@A01-R08-I24-169 wangzhike]# ovs-vsctl set interface dpdk0 
other_config:pmd-rxq-affinity="0:20,2:20,4:20,6:20"
[root@A01-R08-I24-169 wangzhike]# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 4:
 isolated : false
 port: dpdk0  queue-id: 3 7
 port: port-ve54hf69ys  queue-id: 0
pmd thread numa_id 0 core_id 36:
 isolated : false
 port: dpdk0  queue-id: 1 5
pmd thread numa_id 1 core_id 20:
 isolated : true
 port: dpdk0  queue-id: 0 2 4 6
pmd thread numa_id 1 core_id 52:
 isolated : false
 port: port-6s9isqsttp   queue-id: 0
 port: port-b1ri1292y7  queue-id: 0
 port: port-l8n2dvgyijqueue-id: 0

Br,
Wang Zhike
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克



-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 10:49 PM
To: Kevin Traynor; Jan Scheurich; 王志克; Darrell Ball; 
ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port



> -Original Message-
> From: Kevin Traynor [mailto:ktray...@redhat.com]
> Sent: Wednesday, September 6, 2017 2:50 PM
> To: Jan Scheurich <jan.scheur...@ericsson.com>; O Mahony, Billy
> <billy.o.mah...@intel.com>; wangzh...@jd.com; Darrell Ball
> <db...@vmware.com>; ovs-disc...@openvswitch.org; ovs-
> d...@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> On 09/06/2017 02:33 PM, Jan Scheurich wrote:
> > Hi Billy,
> >
> >> You are going to have to take the hit crossing the NUMA boundary at
> some point if your NIC and VM are on different NUMAs.
> >>
> >> So are you saying that it is more expensive to cross the NUMA
> >> boundary from the pmd to the VM that to cross it from the NIC to the
> PMD?
> >
> > Indeed, that is the case: If the NIC crosses the QPI bus when storing
> packets in the remote NUMA there is no cost involved for the PMD. (The QPI
> bandwidth is typically not a bottleneck.) The PMD only performs local
> memory access.
> >
> > On the other hand, if the PMD crosses the QPI when copying packets into a
> remote VM, there is a huge latency penalty involved, consuming lots of PMD
> cycles that cannot be spent on processing packets. We at Ericsson have
> observed exactly this behavior.
> >
> > This latency penalty becomes even worse when the LLC cache hit rate is
> degraded due to LLC cache contention with real VNFs and/or unfavorable
> packet buffer re-use patterns as exhibited by real VNFs compared to typical
> synthetic benchmark apps like DPDK testpmd.
> >
> >>
> >> If so then in that case you'd like to have two (for example) PMDs
> >> polling 2 queues on the same NIC. With the PMDs on each of the NUMA
> nodes forwarding to the VMs local to that NUMA?
> >>
> >> Of course your NIC would then also need to be able know which VM (or
> >> at least which NUMA the VM is on) in order to send the frame to the
> correct rxq.
> >
> > That would indeed be optimal but hard to realize in the general case (e.g.
> with VXLAN encapsulation) as the actual destination is only known after
> tunnel pop. Here perhaps some probabilistic steering of RSS hash values
> based on measured distribution of final destinations might help in the future.
> >
> > But even without that in place, we need PMDs on both NUMAs anyhow
> (for NUMA-aware polling of vhostuser ports), so why not use them to also
> poll remote eth ports. We can achieve better average performance with
> fewer PMDs than with the current limitation to NUMA-local polling.
> >
> 
> If the user has some knowledge of the numa locality of ports and can place
> VM's accordingly, default cross-numa assignment can be harm performance.
> Also, it would make for very unpredictable performance from test to test and
> even for flow to flow on a datapath.
[[BO'M]] Wang's original request would constitute default cross numa assignment 
but I don't think this modified proposal would as it still requires explicit 
config to assign to the remote NUMA.

[Wangzhike] I think configuration option or compiling option are OK to me, 
since only phyiscal NIC rxq needs be configrued. It is only one-shot job.
Regarding the test concern, I think it is worth to clarify different 
performance if the new behavior improves the rx throughput a lot.
> 
> Kevin.
> 
> > BR, Jan
> >

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克

Hi Billy,

Please see my reply in line.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 9:01 PM
To: 王志克; Darrell Ball; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org; 
Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

I think the mention of pinning was confusing me a little. Let me see if I fully 
understand your use case:  You don't 'want' to pin anything but you are using 
it as a way to force the distribution of rxq from a single nic across to PMDs 
on different NUMAs. As without pinning all rxqs are assigned to the NUMA-local 
pmd leaving the other PMD totally unused.

But then when you used pinning you the PMDs became isolated so the vhostuser 
ports rxqs would not be assigned to the PMDs unless they too were pinned. Which 
worked but was not manageable as VM (and vhost ports) came and went.

Yes? 
[Wang Zhike] Yes, exactly.

In that case what we probably want is the ability to pin an rxq to a pmd but 
without also isolating the pmd. So the PMD could be assigned some rxqs manually 
and still have others automatically assigned. 

But what I still don't understand is why you don't put both PMDs on the same 
NUMA node. Given that you cannot program the NIC to know which VM a frame is 
for then you would have to RSS the frames across rxqs (ie across NUMA nodes). 
Of those going to the NICs local-numa node 50% would have to go across the NUMA 
boundary when their destination VM was decided - which is okay - they have to 
cross the boundary at some point. But for or frames going to non-local NUMA, 
50% of these will actually be destined for what was originally the local NUMA 
node. Now these packets (25% of all traffic would ) will cross NUMA *twice* 
whereas if all PMDs were on the NICs NUMA node those frames would never have 
had to pass between NUMA nodes.

In short I think it's more efficient to have both PMDs on the same NUMA node as 
the NIC.

[Wang Zhike] If considering Tx direction, i.e, from VM on different NUMA node 
to phy NIC, I am not sure whether your proposal would downgrade the TX 
performance...
I will try to test different cross-NUMA scenario to get the performance penalty 
data.

There is one more comments below..

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 12:50 PM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> <db...@vmware.com>; ovs-disc...@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> See my reply in line.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 7:26 PM
> To: 王志克; Darrell Ball; ovs-disc...@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> You are going to have to take the hit crossing the NUMA boundary at some
> point if your NIC and VM are on different NUMAs.
> 
> So are you saying that it is more expensive to cross the NUMA boundary
> from the pmd to the VM that to cross it from the NIC to the PMD?
> 
> [Wang Zhike] I do not have such data. I hope we can try the new behavior
> and get the test result, and then know whether and how much performance
> can be improved.

[[BO'M]] You don't need to a code change to compare performance of these two 
scenarios. You can simulate it by pinning queues to VMs. I'd imagine crossing 
the NUMA boundary during the PCI DMA would be cheaper that crossing it over 
vhost. But I don't know what the result would be and this would a pretty 
interesting figure to have by the way.


> 
> If so then in that case you'd like to have two (for example) PMDs polling 2
> queues on the same NIC. With the PMDs on each of the NUMA nodes
> forwarding to the VMs local to that NUMA?
> 
> Of course your NIC would then also need to be able know which VM (or at
> least which NUMA the VM is on) in order to send the frame to the correct
> rxq.
> 
> [Wang Zhike] Currently I do not know how to achieve it. From my view, NIC
> do not know which NUMA should be the destination of the packet. Only
> after OVS handling (eg lookup the fowarding rule in OVS), then it can know
> the destination. If NIC does not know the destination NUMA socket, it does
> not matter which PMD to poll it.
> 
> 
> /Billy.
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 11:41 AM
> > To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> > <db

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克

Hi Billy,

See my reply in line.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 7:26 PM
To: 王志克; Darrell Ball; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org; 
Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

You are going to have to take the hit crossing the NUMA boundary at some point 
if your NIC and VM are on different NUMAs.

So are you saying that it is more expensive to cross the NUMA boundary from the 
pmd to the VM that to cross it from the NIC to the PMD?

[Wang Zhike] I do not have such data. I hope we can try the new behavior and 
get the test result, and then know whether and how much performance can be 
improved.

If so then in that case you'd like to have two (for example) PMDs polling 2 
queues on the same NIC. With the PMDs on each of the NUMA nodes forwarding to 
the VMs local to that NUMA?

Of course your NIC would then also need to be able know which VM (or at least 
which NUMA the VM is on) in order to send the frame to the correct rxq. 

[Wang Zhike] Currently I do not know how to achieve it. From my view, NIC do 
not know which NUMA should be the destination of the packet. Only after OVS 
handling (eg lookup the fowarding rule in OVS), then it can know the 
destination. If NIC does not know the destination NUMA socket, it does not 
matter which PMD to poll it.


/Billy. 

> -Original Message-
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 11:41 AM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> <db...@vmware.com>; ovs-disc...@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> It depends on the destination of the traffic.
> 
> I observed that if the traffic destination is across NUMA socket, the "avg
> processing cycles per packet" would increase 60% than the traffic to same
> NUMA socket.
> 
> Br,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 6:35 PM
> To: 王志克; Darrell Ball; ovs-disc...@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> If you create several PMDs on the NUMA of the physical port does that have
> the same performance characteristic?
> 
> /Billy
> 
> 
> 
> > -Original Message-
> > From: 王志克 [mailto:wangzh...@jd.com]
> > Sent: Wednesday, September 6, 2017 10:20 AM
> > To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> > <db...@vmware.com>; ovs-disc...@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > Yes, I want to achieve better performance.
> >
> > The commit "dpif-netdev: Assign ports to pmds on non-local numa node"
> > can NOT meet my needs.
> >
> > I do have pmd on socket 0 to poll the physical NIC which is also on socket 
> > 0.
> > However, this is not enough since I also have other pmd on socket 1. I
> > hope such pmds on socket 1 can together poll physical NIC. In this
> > way, we have more CPU (in my case, double CPU) to poll the NIC, which
> > results in performance improvement.
> >
> > BR,
> > Wang Zhike
> >
> > -Original Message-
> > From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> > Sent: Wednesday, September 06, 2017 5:14 PM
> > To: Darrell Ball; 王志克; ovs-disc...@openvswitch.org; ovs-
> > d...@openvswitch.org; Kevin Traynor
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Wang,
> >
> > A change was committed to head of master 2017-08-02 "dpif-netdev:
> > Assign ports to pmds on non-local numa node" which if I understand
> > your request correctly will do what you require.
> >
> > However it is not clear to me why you are pinning rxqs to PMDs in the
> > first instance. Currently if you configure at least on pmd on each
> > numa there should always be a PMD available. Is the pinning for
> performance reasons?
> >
> > Regards,
> > Billy
> >
> >
> >
> > > -Original Message-
> > > From: Darrell Ball [mailto:db...@vmware.com]
> > > Sent: Wednesday, September 6, 2017 8:25 AM
&g

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克

Hi Billy,

It depends on the destination of the traffic.

I observed that if the traffic destination is across NUMA socket, the "avg 
processing cycles per packet" would increase 60% than the traffic to same NUMA 
socket.

Br,
Wang Zhike

-Original Message-
From: O Mahony, Billy [mailto:billy.o.mah...@intel.com] 
Sent: Wednesday, September 06, 2017 6:35 PM
To: 王志克; Darrell Ball; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org; 
Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

If you create several PMDs on the NUMA of the physical port does that have the 
same performance characteristic? 

/Billy



> -Original Message-----
> From: 王志克 [mailto:wangzh...@jd.com]
> Sent: Wednesday, September 6, 2017 10:20 AM
> To: O Mahony, Billy <billy.o.mah...@intel.com>; Darrell Ball
> <db...@vmware.com>; ovs-disc...@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor <ktray...@redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> Yes, I want to achieve better performance.
> 
> The commit "dpif-netdev: Assign ports to pmds on non-local numa node" can
> NOT meet my needs.
> 
> I do have pmd on socket 0 to poll the physical NIC which is also on socket 0.
> However, this is not enough since I also have other pmd on socket 1. I hope
> such pmds on socket 1 can together poll physical NIC. In this way, we have
> more CPU (in my case, double CPU) to poll the NIC, which results in
> performance improvement.
> 
> BR,
> Wang Zhike
> 
> -Original Message-
> From: O Mahony, Billy [mailto:billy.o.mah...@intel.com]
> Sent: Wednesday, September 06, 2017 5:14 PM
> To: Darrell Ball; 王志克; ovs-disc...@openvswitch.org; ovs-
> d...@openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> A change was committed to head of master 2017-08-02 "dpif-netdev: Assign
> ports to pmds on non-local numa node" which if I understand your request
> correctly will do what you require.
> 
> However it is not clear to me why you are pinning rxqs to PMDs in the first
> instance. Currently if you configure at least on pmd on each numa there
> should always be a PMD available. Is the pinning for performance reasons?
> 
> Regards,
> Billy
> 
> 
> 
> > -Original Message-
> > From: Darrell Ball [mailto:db...@vmware.com]
> > Sent: Wednesday, September 6, 2017 8:25 AM
> > To: 王志克 <wangzh...@jd.com>; ovs-disc...@openvswitch.org; ovs-
> > d...@openvswitch.org; O Mahony, Billy <billy.o.mah...@intel.com>;
> Kevin
> > Traynor <ktray...@redhat.com>
> > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Adding Billy and Kevin
> >
> >
> > On 9/6/17, 12:22 AM, "Darrell Ball" <db...@vmware.com> wrote:
> >
> >
> >
> > On 9/6/17, 12:03 AM, "王志克" <wangzh...@jd.com> wrote:
> >
> > Hi Darrell,
> >
> > pmd-rxq-affinity has below limitation: (so isolated pmd can
> > not be used for others, which is not my expectation. Lots of VMs come
> > and go on the fly, and manully assignment is not feasible.)
> >   >>After that PMD threads on cores where RX queues
> > was pinned will become isolated. This means that this thread will poll
> > only pinned RX queues
> >
> > My problem is that I have several CPUs spreading on different
> > NUMA nodes. I hope all these CPU can have chance to serve the rxq.
> > However, because the phy NIC only locates on one certain socket node,
> > non-same numa pmd/CPU would be excluded. So I am wondering whether
> we
> > can have different behavior for phy port rxq:
> >   round-robin to all PMDs even the pmd on different NUMA socket.
> >
> > I guess this is a common case, and I believe it would improve
> > rx performance.
> >
> >
> > [Darrell] I agree it would be a common problem and some
> > distribution would seem to make sense, maybe factoring in some
> > favoring of local numa PMDs ?
> > Maybe an optional config to enable ?
> >
> >
> > Br,
> > Wang Zhike
> >
> >

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] dev Digest, Vol 98, Issue 38

2017-09-06 Thread 王志克

Hi Kevin,

Consider the scenario:

One host with 1 physical NIC, and the NIC locates on NUMA socket0. There are 
lots of VM on this host.

I can see several method to improve the performance:
1) Try to make sure the VM memory used for networking would locate on socket0 
forever. Eg, if VM uses 4G memory, we can split 1G for networking and this 1G 
comes from socket 0. In this way, we can always allocate CPU from socket 0 
only. I do not know whether it is feasible or not.
2) If option 1 is not feasible, then VM memory would spread across NUMA socket. 
Then it means packet from physical NIC (socket0) may go to VM on other socket 
(say socket 1). Such across NUMA communication would lead to performance 
downgrade.

What I am talking is option 2. Since across NUMA communication is not 
avoidable, why not add more CPU?

Br,
Wang Zhike

Message: 5
Date: Wed, 6 Sep 2017 10:23:53 +0100
From: Kevin Traynor 
To: ??? , Darrell Ball ,
"ovs-disc...@openvswitch.org" ,
"ovs-dev@openvswitch.org" 
Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
physicalport
Message-ID: 
Content-Type: text/plain; charset=utf-8

On 09/06/2017 08:03 AM, ??? wrote:
> Hi Darrell,
> 
> pmd-rxq-affinity has below limitation: (so isolated pmd can not be used for 
> others, which is not my expectation. Lots of VMs come and go on the fly, and 
> manully assignment is not feasible.)
>   >>After that PMD threads on cores where RX queues was pinned will 
> become isolated. This means that this thread will poll only pinned RX queues
> 
> My problem is that I have several CPUs spreading on different NUMA nodes. I 
> hope all these CPU can have chance to serve the rxq. However, because the phy 
> NIC only locates on one certain socket node, non-same numa pmd/CPU would be 
> excluded. So I am wondering whether we can have different behavior for phy 
> port rxq: 
>   round-robin to all PMDs even the pmd on different NUMA socket.
> 
> I guess this is a common case, and I believe it would improve rx performance.
> 

The issue is that cross numa datapaths occur a large performance penalty
(~2x cycles). This is the reason rxq assignment uses pmds from the same
numa node as the port. Also, any rxqs from other ports that are also
scheduled on the same pmd could suffer as a result of cpu starvation
from that cross-numa assignment.

An issue was that in the case of no pmds available on the correct NUMA
node for a port, it meant that rxqs from that port were not polled at
all. Billy's commit addressed that by allowing cross-numa assignment
*only* in the event of no pmds on the same numa node as the port.

If you look through the threads on Billy's patch you'll see more
discussion on it.

Kevin.

> Br,
> Wang Zhike
> -Original Message-
> From: Darrell Ball [mailto:db...@vmware.com] 
> Sent: Wednesday, September 06, 2017 1:39 PM
> To: ???; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port
> 
> You could use  pmd-rxq-affinity for the queues you want serviced locally and 
> let the others go remote
> 
> On 9/5/17, 8:14 PM, "???"  wrote:
> 
> It is a bit different from my expectation.
> 
> 
> 
> I have separate CPU and pmd for each NUMA node. However, the physical NIC 
> only locates on NUMA socket0. So only part of CPU and pmd (the ones in same 
> NUMA node) can poll the physical NIC. Since I have multiple rx queue, I hope 
> part queues can be polled with pmd on same node, others can be polled with 
> pmd on non-local numa node. In this way, we have more pmds contributes the 
> polling of physical NIC, so performance improvement is expected from total rx 
> traffic view.
> 
> 
> 
> Br,
> 
> Wang Zhike
> 
> 
> 
> -Original Message-
> 
> From: Darrell Ball [mailto:db...@vmware.com] 
> 
> Sent: Wednesday, September 06, 2017 10:47 AM
> 
> To: ???; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
> 
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical 
> port
> 
> 
> 
> This same numa node limitation was already removed, although same numa is 
> preferred for performance reasons.
> 
> 
> 
> commit c37813fdb030b4270d05ad61943754f67021a50d
> 
> Author: Billy O'Mahony 
> 
> Date:   Tue Aug 1 14:38:43 2017 -0700
> 
> 
> 
> dpif-netdev: Assign ports to pmds on non-local numa node.
> 
> 
> 
> Previously if there is no available (non-isolated) pmd on the numa 
> node
> 
> for a port then the port is not polled at all. This can result in a
> 
> non-operational system until

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-06 Thread 王志克

Hi Darrell,

pmd-rxq-affinity has below limitation: (so isolated pmd can not be used for 
others, which is not my expectation. Lots of VMs come and go on the fly, and 
manully assignment is not feasible.)
  >>After that PMD threads on cores where RX queues was pinned will 
become isolated. This means that this thread will poll only pinned RX queues

My problem is that I have several CPUs spreading on different NUMA nodes. I 
hope all these CPU can have chance to serve the rxq. However, because the phy 
NIC only locates on one certain socket node, non-same numa pmd/CPU would be 
excluded. So I am wondering whether we can have different behavior for phy port 
rxq: 
  round-robin to all PMDs even the pmd on different NUMA socket.

I guess this is a common case, and I believe it would improve rx performance.

Br,
Wang Zhike
-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Wednesday, September 06, 2017 1:39 PM
To: 王志克; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

You could use  pmd-rxq-affinity for the queues you want serviced locally and 
let the others go remote

On 9/5/17, 8:14 PM, "王志克" <wangzh...@jd.com> wrote:

It is a bit different from my expectation.



I have separate CPU and pmd for each NUMA node. However, the physical NIC 
only locates on NUMA socket0. So only part of CPU and pmd (the ones in same 
NUMA node) can poll the physical NIC. Since I have multiple rx queue, I hope 
part queues can be polled with pmd on same node, others can be polled with pmd 
on non-local numa node. In this way, we have more pmds contributes the polling 
of physical NIC, so performance improvement is expected from total rx traffic 
view.



Br,

Wang Zhike



-Original Message-

From: Darrell Ball [mailto:db...@vmware.com] 

Sent: Wednesday, September 06, 2017 10:47 AM

To: 王志克; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org

Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical 
port



This same numa node limitation was already removed, although same numa is 
preferred for performance reasons.



commit c37813fdb030b4270d05ad61943754f67021a50d

Author: Billy O'Mahony <billy.o.mah...@intel.com>

Date:   Tue Aug 1 14:38:43 2017 -0700



dpif-netdev: Assign ports to pmds on non-local numa node.



Previously if there is no available (non-isolated) pmd on the numa node

for a port then the port is not polled at all. This can result in a

non-operational system until such time as nics are physically

repositioned. It is preferable to operate with a pmd on the 'wrong' numa

node albeit with lower performance. Local pmds are still chosen when

available.



Signed-off-by: Billy O'Mahony <billy.o.mah...@intel.com>

Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>

Co-authored-by: Ilya Maximets <i.maxim...@samsung.com>





The sentence “The rx queues are assigned to pmd threads on the same NUMA 
node in a round-robin fashion.”



under



DPDK Physical Port Rx Queues¶



should be removed since it is outdated in a couple of ways and there is 
other correct documentation on the same page

and also here 
https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_howto_dpdk_=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=iNebKvfYjcXbjMsmtLJqThRUImv8W4PRrYWpD-QwUVg=KG3MmQe4QkUkyG3xsCoF6DakFsZh_eg9aEyhYFUKF2c=
 



Maybe you could submit a patch ?



Thanks Darrell





On 9/5/17, 7:18 PM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:



Hi All,







I read below doc about pmd assignment for physical port. I think the 
limitation “on the same NUMA node” may be not efficient.








https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=4wch_Q6fqo0stIDE4K2loh0z-dshuligqsrAV_h-QuU=
 



DPDK Physical Port Rx 
Queues¶<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_-23dpdk-2Dphysical-2Dport-2Drx-2Dqueues=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=SexDthg-hfPaGjvjCRjkPPY1kK1NfycLQSDw6WHVA

Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-05 Thread 王志克

It is a bit different from my expectation.

I have separate CPU and pmd for each NUMA node. However, the physical NIC only 
locates on NUMA socket0. So only part of CPU and pmd (the ones in same NUMA 
node) can poll the physical NIC. Since I have multiple rx queue, I hope part 
queues can be polled with pmd on same node, others can be polled with pmd on 
non-local numa node. In this way, we have more pmds contributes the polling of 
physical NIC, so performance improvement is expected from total rx traffic view.

Br,
Wang Zhike

-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Wednesday, September 06, 2017 10:47 AM
To: 王志克; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

This same numa node limitation was already removed, although same numa is 
preferred for performance reasons.

commit c37813fdb030b4270d05ad61943754f67021a50d
Author: Billy O'Mahony <billy.o.mah...@intel.com>
Date:   Tue Aug 1 14:38:43 2017 -0700

dpif-netdev: Assign ports to pmds on non-local numa node.

Previously if there is no available (non-isolated) pmd on the numa node
for a port then the port is not polled at all. This can result in a
non-operational system until such time as nics are physically
repositioned. It is preferable to operate with a pmd on the 'wrong' numa
node albeit with lower performance. Local pmds are still chosen when
available.

Signed-off-by: Billy O'Mahony <billy.o.mah...@intel.com>
Signed-off-by: Ilya Maximets <i.maxim...@samsung.com>
Co-authored-by: Ilya Maximets <i.maxim...@samsung.com>


The sentence “The rx queues are assigned to pmd threads on the same NUMA node 
in a round-robin fashion.”

under

DPDK Physical Port Rx Queues¶

should be removed since it is outdated in a couple of ways and there is other 
correct documentation on the same page
and also here http://docs.openvswitch.org/en/latest/howto/dpdk/

Maybe you could submit a patch ?

Thanks Darrell


On 9/5/17, 7:18 PM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:

Hi All,



I read below doc about pmd assignment for physical port. I think the 
limitation “on the same NUMA node” may be not efficient.




https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=4wch_Q6fqo0stIDE4K2loh0z-dshuligqsrAV_h-QuU=
 

DPDK Physical Port Rx 
Queues¶<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_-23dpdk-2Dphysical-2Dport-2Drx-2Dqueues=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=SexDthg-hfPaGjvjCRjkPPY1kK1NfycLQSDw6WHVArQ=
 >



$ ovs-vsctl set Interface  options:n_rxq=



The above command sets the number of rx queues for DPDK physical interface. 
The rx queues are assigned to pmd threads on the same NUMA node in a 
round-robin fashion.

Consider below case:



One host has one PCI NIC on NUMA node 0, and has 4 VMs, which spread in 
NUMA node 0 and 1. There are multiple rx queues configured on the physical NIC. 
We configured 4 pmd (two cpu from NUMA node0, and two cpu from node 1). Since 
the physical NIC locates on NUMA node0, only pmds on same NUMA node can poll 
its rxq. As a result, only two cpu can be used for polling physical NIC.



If we compare the OVS kernel mode, there is no such limitation.



So question:

should we remove “same NUMA node” limitation fro physical port rx queues? 
Or we have other options to improve the performance for this case?



Br,

Wang Zhike



___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8=Whz73vLTYWkBuEL6reD88bkzCgSfqpgb7MDiCG5fB4A=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

2017-09-05 Thread 王志克

Hi All,

I read below doc about pmd assignment for physical port. I think the limitation 
“on the same NUMA node” may be not efficient.

http://docs.openvswitch.org/en/latest/intro/install/dpdk/
DPDK Physical Port Rx 
Queues¶

$ ovs-vsctl set Interface  options:n_rxq=

The above command sets the number of rx queues for DPDK physical interface. The 
rx queues are assigned to pmd threads on the same NUMA node in a round-robin 
fashion.
Consider below case:

One host has one PCI NIC on NUMA node 0, and has 4 VMs, which spread in NUMA 
node 0 and 1. There are multiple rx queues configured on the physical NIC. We 
configured 4 pmd (two cpu from NUMA node0, and two cpu from node 1). Since the 
physical NIC locates on NUMA node0, only pmds on same NUMA node can poll its 
rxq. As a result, only two cpu can be used for polling physical NIC.

If we compare the OVS kernel mode, there is no such limitation.

So question:
should we remove “same NUMA node” limitation fro physical port rx queues? Or we 
have other options to improve the performance for this case?

Br,
Wang Zhike

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] netdev-dpdk: vhost get stats fix

2017-08-25 Thread 王志克

OK, I can separate this.

Regarding this issue, my idea:
   I donot think it is libvirtd bug. We can have different stats set, say 
common set, extended set and so on, but we should NOT have one paramter level 
difference, since it is too hard to stats user to parse it.
   Now I think it is better to change 
__netdev_dpdk_vhost_send()/netdev_dpdk_vhost_update_tx_counters() to reflect 
the tx_errors (set as zero for now since no error case). 


I copy more background info as below:


I saw below system log in /var/log/message. I am using libvirtd 3.2.0.

Aug 21 11:21:00 A01-R06-I29-183 libvirtd[24192]: 2017-08-21 03:21:00.057+: 
24198: error : virCommandWait:2572 : internal error: Child process (ovs-vsctl 
--timeout=5 get Interface port-7zel2so9sg statistics:rx_errors 
statistics:rx_dropped statistics:tx_errors statistics:tx_dropped) unexpected 
exit status 1: ovs-vsctl: no key "tx_errors" in Interface record 
"port-7zel2so9sg" column statistics
Aug 21 11:21:00 A01-R06-I29-183 ovs-vsctl: ovs|1|db_ctl_base|ERR|no key 
"tx_errors" in Interface record "port-ij1mlalpxt" column statistics
***

Br.
Wang Zhike
-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Friday, August 25, 2017 2:04 PM
To: 王志克; d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: vhost get stats fix

I am wondering if we should split the 
+stats->tx_errors = 0;
out from this patch and discuss it separately ?

In theory, if a stat is really not supported, we should not display a value for 
it.
Displaying 0 could be misleading if there really is a problem and we are not 
detecting it.

Darrell

On 8/24/17, 7:51 PM, "ovs-dev-boun...@openvswitch.org on behalf of wangzhike" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:

1. "+=" should be "="
2. tx_errors is a generic param, and should be 0 since vhost does not
   create such error.
   Or some app, like libvirt will complain for failure to find this key.

Signed-off-by: wangzhike <wangzh...@jd.com>
---
 lib/netdev-dpdk.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e90fd0e..1c50aa3 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2016,14 +2016,15 @@ netdev_dpdk_vhost_get_stats(const struct netdev 
*netdev,
 
 rte_spinlock_lock(>stats_lock);
 /* Supported Stats */
-stats->rx_packets += dev->stats.rx_packets;
-stats->tx_packets += dev->stats.tx_packets;
+stats->rx_packets = dev->stats.rx_packets;
+stats->tx_packets = dev->stats.tx_packets;
 stats->rx_dropped = dev->stats.rx_dropped;
-stats->tx_dropped += dev->stats.tx_dropped;
+stats->tx_dropped = dev->stats.tx_dropped;
 stats->multicast = dev->stats.multicast;
 stats->rx_bytes = dev->stats.rx_bytes;
 stats->tx_bytes = dev->stats.tx_bytes;
 stats->rx_errors = dev->stats.rx_errors;
+stats->tx_errors = 0;
 stats->rx_length_errors = dev->stats.rx_length_errors;
 
 stats->rx_1_to_64_packets = dev->stats.rx_1_to_64_packets;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=pn_VA8IUu6tNFyFWU0igR0Qo4OPeZ8lCpCMjsGYlKA0=1aewY464s93D6GGArf7n8hyc--1TGkrxNBn89LfUNro=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] Fix: vhost user port status

2017-08-24 Thread 王志克

Hi Darrell,

Thanks for your comment. I just create a new patch accordingly.

Yes, I am testing vhost user client port.
After ovs-vswitchd restarts, I did nothing. I just used command "ovs-ofctl show 
br" to get the port status, and found the status is "LINK DOWN".
With the patch, the status is 0 as expected.

Br,
Wang Zhike

-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Friday, August 25, 2017 9:35 AM
To: 王志克; d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH] Fix: vhost user port status

Hi Lawrence 

I am not very particular about the title prefix, but could we use something more
technically specific than ‘Fix’; maybe netdev-dpdk  ?
Title should also end with a period.

On 8/23/17, 9:13 PM, "ovs-dev-boun...@openvswitch.org on behalf of wangzhike" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:

After ovs-vswitchd reboots, vhost user status is displayed as
LINK DOWN though the traffic is OK.

The problem is that the port may be udpated while the vhost_reconfigured
is false. Then the vhost_reconfigured is updated to true. As a result,
the vhost user status is kept as LINK-DOWN.

[Darrell]
Just so I understand, are you actually using vhost-user or really 
vhost-user-client ports ?
JTBS, can you describe what you did after vswitchd restart ?


Signed-off-by: wangzhike <wangzh...@jd.com>
---
 lib/netdev-dpdk.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1aaf6f7..e90fd0e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -3227,7 +3227,11 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk 
*dev)
 }
 
 if (netdev_dpdk_get_vid(dev) >= 0) {
-dev->vhost_reconfigured = true;
+if (dev->vhost_reconfigured == false) {
+dev->vhost_reconfigured = true;
+/* change vhost_reconfigured may affect carrier status */

[Darrell]
Maybe we can say ?
/* Carrier status may need updating. */
Would this be ok ?
BTW, comments should end with a period.

+netdev_change_seq_changed(>up);
+}
 }
 
 return 0;
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=TOaqootDYDg45mYkwMOQnfh-LiXCCGZAYC37gEf9aWE=5_PkYdvwnY5DuxfvSrVJ2WbjCGRVTpwE6kjLs6FLbzI=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [ovs-discuss] OVS+DPDK QoS rate limit issue

2017-08-24 Thread 王志克

Hi Lance,

Your patch works. Thanks.

BR,
Wang Zhike

-Original Message-
From: Lance Richardson [mailto:lrich...@redhat.com] 
Sent: Thursday, August 24, 2017 8:10 PM
To: 王志克
Cc: ovs-dev@openvswitch.org; ovs-disc...@openvswitch.org
Subject: Re: [ovs-discuss] OVS+DPDK QoS rate limit issue


> From: "王志克" <wangzh...@jd.com>
> To: ovs-dev@openvswitch.org, ovs-disc...@openvswitch.org
> Sent: Wednesday, August 23, 2017 11:41:05 PM
> Subject: [ovs-discuss] OVS+DPDK QoS rate limit issue
> 
> 
> 
> Hi All,
> 
> 
> 
> I am using OVS2.7.0 and DPDK 16.11, and testing rate limit function.
> 
> 
> 
> I found that if the policing_rate is set very large, say 5Gbps, the rate is
> limited dramatically to very low value, like 800Mbps.
> 
> The command is as below:
> 
> ovs-vsctl set interface port-7zel2so9sg ingress_policing_rate=500
> ingress_policing_burst=50
> 
> 
> 
> If we set the rate lower than 4Gbps, the rate is limited correctly.
> 
> 
> 
> Test setup:
> 
> Sender (DPDK pktGen) sends out about 10Gbps udp packet, with size about 1420
> IP size.
> 
> The rate limit is set on VM vhost-user-client port.
> 
> 
> 
> Any idea about this issue? Is that known issue?
> 
> 

It seems 32-bit arithmetic is being used when converting the rate from
kilobits per second to bytes per second. Could you give this patch a try?

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1aaf6f7e2..d6ed2c7b0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2229,8 +2229,8 @@ netdev_dpdk_policer_construct(uint32_t rate, uint32_t 
burst)
     rte_spinlock_init(>policer_lock);
 
     /* rte_meter requires bytes so convert kbits rate and burst to bytes. */
-    rate_bytes = rate * 1000/8;
-    burst_bytes = burst * 1000/8;
+    rate_bytes = rate * 1000ULL/8;
+    burst_bytes = burst * 1000ULL/8;
 
     policer->app_srtcm_params.cir = rate_bytes;
     policer->app_srtcm_params.cbs = burst_bytes;

Regards,

   Lance Richardson

> 
> Br,
> 
> Wang Zhike
> 
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] vhost user port is displayed as LINK DOWN after ovs-vswitchd reboot

2017-08-23 Thread 王志克

Thanks Darrell.

I just send it out via git send-email.

Br,
Wang Zhike (lawrence)

-Original Message-
From: Darrell Ball [mailto:db...@vmware.com] 
Sent: Wednesday, August 23, 2017 8:48 AM
To: 王志克; d...@openvswitch.org
Subject: Re: [ovs-dev] vhost user port is displayed as LINK DOWN after 
ovs-vswitchd reboot

This looks reasonable at first glance Lawrence

Would you like to try git format-patch and git send-email ?
It would make it easier in the long term.

Thanks Darrell

On 8/21/17, 6:00 AM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:

Hi,

I create a pull request, regarding the vhost user port status.

The problem is that the port may be udpated while the vhost_reconfigured is 
false. Then the vhost_reconfigured is updated.
As a result, the vhost user status is kept as LINK-DOWN. Note the traffic 
is OK in this case. Only the status is wrong.


https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs_pull_198=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=AM5cxUEtfcit4Rc-vVRud4qt3gZ4cRk19IR_fqiEppM=e55NLote0ZavherSDH6l2VDVlSIpYJ4df-RVDfw1t-U=
 

Thanks.

Br,
Lawrence

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=AM5cxUEtfcit4Rc-vVRud4qt3gZ4cRk19IR_fqiEppM=zTr1Xq-O5yLDfASgQ1Gs9ZYPrhuDk3kayv2fOAznbK4=
 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] vhost user port is displayed as LINK DOWN after ovs-vswitchd reboot

2017-08-21 Thread 王志克

Hi,

I create a pull request, regarding the vhost user port status.

The problem is that the port may be udpated while the vhost_reconfigured is 
false. Then the vhost_reconfigured is updated.
As a result, the vhost user status is kept as LINK-DOWN. Note the traffic is OK 
in this case. Only the status is wrong.

https://github.com/openvswitch/ovs/pull/198

Thanks.

Br,
Lawrence

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-07-24 Thread 王志克

Thanks Joe.

BR,
Wang Zhike

-Original Message-
From: Joe Stringer [mailto:j...@ovn.org] 
Sent: Saturday, July 22, 2017 2:29 AM
To: 王志克
Cc: ovs dev; Ben Pfaff
Subject: Re: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs 
reassemble

On 6 July 2017 at 13:57, Ben Pfaff <b...@ovn.org> wrote:
> From: wangzhike <wangzh...@jd.com>
>
> Ovs and kernel stack would add frag_queue to same netns_frags list.
> As result, ovs and kernel may access the fraq_queue without correct
> lock. Also the struct ipq may be different on kernel(older than 4.3),
> which leads to invalid pointer access.
>
> The fix creates specific netns_frags for ovs.
>
> Signed-off-by: wangzhike <wangzh...@jd.com>
> ---

Hi,

Thanks a lot for your hard work on this. I know through the several
revisions of private review we did, you considered several options for
how to better structure this, and I don't see a straightforward way to
improve it any further.

I've run this patch dozens of times on a variety of kernel versions,
and at this point I am at this point reasonably confident that it
doesn't make the current fragmentation situation worse. Given that it
fixes a problem that you have been hitting, I think that the best way
for this to get more testing is for us to apply it. If there are
subsequent future issues with this code, then we can always deal with
that at a later time. Given the nature of bugs that affect this area
of the code--that is to say, racy---I am not currently intending to
backport this patch to earlier branches. It will however become part
of the upcoming OVS 2.8 release.

I applied this patch to master.

Thanks,
Joe
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 答复: 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-07-06 Thread 王志克

Hi Greg,

Any progress?

Thanks.

Br,
Wang Zhike

-Original Message-
From: Greg Rose [mailto:gvrose8...@gmail.com] 
Sent: Friday, June 30, 2017 1:23 AM
To: 王志克
Cc: d...@openvswitch.org; Joe Stringer
Subject: Re: 答复: [ovs-dev] 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for 
ovs reassemble

On 06/28/2017 05:53 PM, 王志克 wrote:
> Hi Greg,
> 
> I just download offical tar bar:
> wget http://openvswitch.org/releases/openvswitch-2.6.0.tar.gz
> 
> Then compiling as below: ( I do not see any compiling issue)
> 
> ./configure --with-linux=/lib/modules/$(uname -r)/build
> make
> make install
> make modules_install
> 
> Br,
> Wang Zhike

Weird... below is what I get at the compile phase when I follow the same steps. 
 Let me
try a completely fresh installation of Centos 7.2 on a new VM.  Perhaps 
something has muddled
the build environments for the VM I'm using.

I'll try that and see if I can get something going.

Thanks,

- Greg

In file included from include/net/inet_sock.h:24:0,
  from include/net/ip.h:30,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/net/ip.h:4,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netfilter_ipv6.h:7,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/actions.c:25:
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:125:34:
 error: conflicting types for 
netdev_notifier_info_to_dev
  static inline struct net_device *netdev_notifier_info_to_dev(void *info)
   ^
In file included from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:4:0,
  from include/net/inet_sock.h:24,
  from include/net/ip.h:30,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/net/ip.h:4,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netfilter_ipv6.h:7,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/actions.c:25:
include/linux/netdevice.h:2257:1: note: previous definition of 
netdev_notifier_info_to_dev was here
  netdev_notifier_info_to_dev(const struct netdev_notifier_info *info)
  ^
In file included from include/uapi/linux/if_arp.h:26:0,
  from include/linux/if_arp.h:27,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/datapath.c:23:
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:125:34:
 error: conflicting types for 
netdev_notifier_info_to_dev
  static inline struct net_device *netdev_notifier_info_to_dev(void *info)
   ^
In file included from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:4:0,
  from include/uapi/linux/if_arp.h:26,
  from include/linux/if_arp.h:27,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/datapath.c:23:
include/linux/netdevice.h:2257:1: note: previous definition of 
netdev_notifier_info_to_dev was here
  netdev_notifier_info_to_dev(const struct netdev_notifier_info *info)
  ^
In file included from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/dp_notify.c:19:0:
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:125:34:
 error: conflicting types for 
netdev_notifier_info_to_dev
  static inline struct net_device *netdev_notifier_info_to_dev(void *info)
   ^
In file included from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:4:0,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/dp_notify.c:19:
include/linux/netdevice.h:2257:1: note: previous definition of 
netdev_notifier_info_to_dev was here
  netdev_notifier_info_to_dev(const struct netdev_notifier_info *info)
  ^
In file included from include/net/sock.h:51:0,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/net/sock.h:4,
  from include/linux/tcp.h:23,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/conntrack.c:21:
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:125:34:
 error: conflicting types for 
netdev_notifier_info_to_dev
  static inline struct net_device *netdev_notifier_info_to_dev(void *info)
   ^
In file included from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/linux/netdevice.h:4:0,
  from include/net/sock.h:51,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/compat/include/net/sock.h:4,
  from include/linux/tcp.h:23,
  from 
/home/gvrose/prj/openvswitch-2.6.0/datapath/linux/conntrack.c:21:
include/linux/net

Re: [ovs-dev] 答复: 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-06-28 Thread 王志克

Hi Greg,

I just download offical tar bar:
wget http://openvswitch.org/releases/openvswitch-2.6.0.tar.gz

Then compiling as below: ( I do not see any compiling issue)

./configure --with-linux=/lib/modules/$(uname -r)/build
make
make install
make modules_install

Br,
Wang Zhike
-Original Message-
From: Greg Rose [mailto:gvrose8...@gmail.com] 
Sent: Thursday, June 29, 2017 4:29 AM
To: 王志克
Cc: d...@openvswitch.org; Joe Stringer
Subject: Re: 答复: [ovs-dev] 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for 
ovs reassemble

On 06/26/2017 05:51 PM, 王志克 wrote:
> Hi Greg,
> 
> The exact issue occured on the 20th of check-kmod (sometimes there are other 
> kernel issue: kernel just hangs but without panic). OVS2.6.0 on CentOS7.2 
> with kernel 3.10.0-327.el7.x86_64. Some info below, which hopes helpful.
> 
> datapath-sanity
> 
>1: datapath - ping between two ports   ok
>2: datapath - http between two ports   ok
>3: datapath - ping between two ports on vlan   ok
>4: datapath - ping6 between two ports  ok
>5: datapath - ping6 between two ports on vlan  ok
>6: datapath - ping over vxlan tunnel   FAILED 
> (system-traffic.at:159)
>7: datapath - ping over gre tunnel FAILED 
> (system-traffic.at:199)
>8: datapath - ping over geneve tunnel  skipped 
> (system-traffic.at:213)
>9: datapath - basic truncate actionok
>   10: datapath - truncate and output to gre tunnelFAILED 
> (system-traffic.at:445)
>   11: conntrack - controller  FAILED 
> (system-traffic.at:522)
>   12: conntrack - IPv4 HTTP   ok
>   13: conntrack - IPv6 HTTP   ok
>   14: conntrack - IPv4 ping   ok
>   15: conntrack - IPv6 ping   ok
>   16: conntrack - commit, recirc  ok
>   17: conntrack - preserve registers  ok
>   18: conntrack - invalid ok
>   19: conntrack - zones   ok
>   20: conntrack - zones from field (system crash...)
> 
> 

[snipped]

Hi Wang,

I am having some definite problems trying to get this to repro.  I can't even 
get
openvswitch-2.6.0 to build.  I am running into numerous compatibility layer 
issues
with netfilter and the net_ns () code that prevent compilation, much
less getting any check-kmod tests to run.  It's a complete mess.

Can you point me to a link with an openvswitch 2.6 tarball that builds on your 
Centos7.2
3.10.0-327.el7.x86_64  kernel?

I'm building on Centos 7.2 as well - using the 3.10.0-514.el7.x86_64 kernel 
myself but that shouldn't
matter.  Or if it does then that is an important detail.

Thanks,

- Greg
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] 答复: [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-06-26 Thread 王志克

Hi Joe, Greg,

I tried to create a pull request, please check whether it works. Thanks.

https://github.com/openvswitch/ovs/pull/187

Br,
Wang Zhike
-Original Message-
From: Joe Stringer [mailto:j...@ovn.org] 
Sent: Saturday, June 24, 2017 5:15 AM
To: 王志克
Cc: d...@openvswitch.org
Subject: Re: 答复: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs 
reassemble

Hi Wang Zhike,

I'd like if others like Greg could take a look as well, since this
code is delicate. The more review it gets, the better. It seems like
maybe the version of your email that goes to the list does not get the
attachment. Perhaps you could try sending the patch using git
send-email or putting the patch on GitHub instead, and linking to it
here.

For what it's worth, I did run your patch for a while and it seemed
OK, but when I tried again today on an Ubuntu Trusty (Linux
3.13.0-119-generic) box, running make check-kmod, I saw an issue with
get_next_timer_interrupt():

[181250.892557] BUG: unable to handle kernel paging request at a03317e0
[181250.892557] IP: [] get_next_timer_interrupt+0x86/0x250
[181250.892557] PGD 1c11067 PUD 1c12063 PMD 1381a2067 PTE 0
[181250.892557] Oops:  [#1] SMP
[181250.892557] Modules linked in: nf_nat_ipv6 nf_nat_ipv4 nf_nat
gre(-) nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6
nf_defrag_ipv4 nf_conntrack_netlink nfnetlink nf_conntrack bonding
8021q garp stp mrp llc veth nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
fscache dm_crypt kvm_intel kvm serio_raw netconsole configfs
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy
ahci libahci [last unloaded: libcrc32c]
[181250.892557] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OX
3.13.0-119-generic #166-Ubuntu
[181250.892557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Bochs 01/01/2011
[181250.892557] task: 81c15480 ti: 81c0 task.ti:
81c0
[181250.892557] RIP: 0010:[]  []
get_next_timer_interrupt+0x86/0x250
[181250.892557] RSP: 0018:81c01e00  EFLAGS: 00010002
[181250.892557] RAX: a03317c8 RBX: 000102b245da RCX:
00db
[181250.892557] RDX: 81ebac58 RSI: 00db RDI:
000102b245db
[181250.892557] RBP: 81c01e48 R08: 00c88c1c R09:

[181250.892557] R10:  R11:  R12:
000142b245d9
[181250.892557] R13: 81eb9e80 R14: 000102b245da R15:
00cd63e8
[181250.892557] FS:  () GS:88013fc0()
knlGS:
[181250.892557] CS:  0010 DS:  ES:  CR0: 8005003b
[181250.892557] CR2: a03317e0 CR3: 3707f000 CR4:
06f0
[181250.892557] Stack:
[181250.892557]   81c01e30 810a3af5
88013fc13bc0
[181250.892557]  88013fc0dce0 000102b245da 
0063ae154000
[181250.892557]  00cd63e8 81c01ea8 810da655
a4d8c2cb6200
[181250.892557] Call Trace:
[181250.892557]  [] ? set_next_entity+0x95/0xb0
[181250.892557]  [] tick_nohz_stop_sched_tick+0x1e5/0x340
[181250.892557]  [] __tick_nohz_idle_enter+0xa1/0x160
[181250.892557]  [] tick_nohz_idle_enter+0x3d/0x70
[181250.892557]  [] cpu_startup_entry+0x87/0x2b0
[181250.892557]  [] rest_init+0x77/0x80
[181250.892557]  [] start_kernel+0x432/0x43d
[181250.892557]  [] ? repair_env_string+0x5c/0x5c
[181250.892557]  [] ? early_idt_handler_array+0x120/0x120
[181250.892557]  [] x86_64_start_reservations+0x2a/0x2c
[181250.892557]  [] x86_64_start_kernel+0x143/0x152
[181250.892557] Code: 8b 7d 10 4d 8b 75 18 4c 39 f7 78 5c 40 0f b6 cf
89 ce 48 63 c6 48 c1 e0 04 49 8d 54 05 00 48 8b 42 28 48 83 c2 28 48
39 d0 74 0e  40 18 01 74 24 48 8b 00 48 39 d0 75 f2 83 c6 01 40 0f
b6 f6
[181250.892557] RIP  [] get_next_timer_interrupt+0x86/0x250
[181250.892557]  RSP 
[181250.892557] CR2: a03317e0

It seems like perhaps a fragment timer signed up by OVS is still
remaining when the OVS module is unloaded, so it may attempt to clean
up an entry using OVS code but the OVS code has been unloaded at that
point. This might be related to IPv6 cvlan test - that seems to be
where my VM froze and went to 100% CPU, but I would think that the
IPv6 fragmentation cleanup test is a more likely to cause this, since
it leaves fragments behind in the cache after the test finishes. I've
only hit this when running all of the tests in make check-kmod.

Cheers,
Joe

On 22 June 2017 at 17:53, 王志克 <wangzh...@jd.com> wrote:
> Hi Joe,
>
> Please check the attachment. Thanks.
>
> Br,
> Wang Zhike
>
> -邮件原件-
> 发件人: Joe Stringer [mailto:j...@ovn.org]
> 发送时间: 2017年6月23日 8:20
> 收件人: 王志克
> 抄送: d...@openvswitch.org
> 主题: Re: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs reassemble
>
> On 21 June 2017 at 18:54, 王志克 <wangzh...@jd.com> wrote:
>> Ovs and kernel stack w

[ovs-dev] 答复: 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-06-26 Thread 王志克

: 003407e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Stack:
 8800b9f5e780  8800b9a8dfd8 8800b9a8fe10
 8800b9a8fe48 20cc1170855d3261 8800bb62dbc0 0863b6f08300
 0001 8800bb62cf00 000100882ccd 8800b9a8fe88
Call Trace:
 [] tick_nohz_stop_sched_tick+0x1e8/0x2e0
 [] ? native_sched_clock+0x35/0x80
 [] __tick_nohz_idle_enter+0x9e/0x150
 [] tick_nohz_idle_enter+0x3d/0x70
 [] cpu_startup_entry+0x9e/0x290
 [] start_secondary+0x1ba/0x230
Code: 18 49 8b 7e 10 48 39 cf 48 89 ca 78 5a 40 0f b6 d7 89 d6 48 63 c6 48 c1 
e0 04 49 8d 0c 06 48 8b 41 28 48 83 c1 28 48 39 c8 74 0e  40
   
18 01 74 23 48 8b 00 48 39 c8 75 f2 83 c6 01 40 0f b6 f6
RIP  [] get_next_timer_interrupt+0x97/0x270
 RSP 


Wang Zhike

-邮件原件-
发件人: Greg Rose [mailto:gvrose8...@gmail.com] 
发送时间: 2017年6月27日 6:26
收件人: 王志克
抄送: d...@openvswitch.org; Joe Stringer
主题: Re: [ovs-dev] 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for ovs 
reassemble

On 06/26/2017 04:56 AM, 王志克 wrote:
> Hi Joe,
>
> I will try to check how to send the patch. Maybe tomorrow since I am quite 
> busy now.
>
> Regarding the crash, I can reproduce it even with official OVS, like 
> ovs2.6.0. (I just run the check kmod in a loop until kernel panic). So it is 
> not related to the new fix.
>
> Br,
> Wang Zhike
I've been running 'make check-kmod' in a continuous loop on 3 virtual machines 
since this morning.  So far no kernel splats but plenty of errors:

This is on the Ubuntu machine running 4.0 kernel:

ERROR: 66 tests were run,
24 failed unexpectedly.
23 tests were skipped.
## -- ## ## system-kmod-testsuite.log was 
created. ## ## -- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

To: <b...@openvswitch.org>
   Subject: [openvswitch 2.7.90] system-kmod-testsuite: 16 17 35 57 58 59 
60 61 62 63 70 71 72 75 76 81 82 83 84 85 86 87 88 89 failed

Centos 7.2 running 4.9.24 kernel:

## - ##
## Test results. ##
## - ##

ERROR: 76 tests were run,
34 failed unexpectedly.
13 tests were skipped.
## -- ## ## system-kmod-testsuite.log was 
created. ## ## -- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

To: <b...@openvswitch.org>
   Subject: [openvswitch 2.7.90] system-kmod-testsuite: 2 14 15 20 21 22 23 
24 25 26 27 28 29 30 31 32 47 48 49 50 51 57 59 60 61 62 70 71 75 76 84 85 86 
87 failed

Centos 7.2 running 4.10.17 kernel:

## - ##
## Test results. ##
## - ##

ERROR: 74 tests were run,
34 failed unexpectedly.
15 tests were skipped.
## -- ## ## system-kmod-testsuite.log was 
created. ## ## -- ##

Please send `tests/system-kmod-testsuite.log' and all information you think 
might help:

To: <b...@openvswitch.org>
   Subject: [openvswitch 2.7.90] system-kmod-testsuite: 2 14 15 20 21 22 23 
24 25 26 27 28 29 30 31 32 47 48 49 50 51 57 59 60 61 62 70 71 75 76 84 85 86 
87 failed

I confess to not spending a lot of time running check-kmod.  I certainly intend 
to in the future.

- Greg

>
> -邮件原件-
> 发件人: Joe Stringer [mailto:j...@ovn.org]
> 发送时间: 2017年6月24日 5:15
> 收件人: 王志克
> 抄送: d...@openvswitch.org
> 主题: Re: 答复: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs 
> reassemble
>
> Hi Wang Zhike,
>
> I'd like if others like Greg could take a look as well, since this code is 
> delicate. The more review it gets, the better. It seems like maybe the 
> version of your email that goes to the list does not get the attachment. 
> Perhaps you could try sending the patch using git send-email or putting the 
> patch on GitHub instead, and linking to it here.
>
> For what it's worth, I did run your patch for a while and it seemed 
> OK, but when I tried again today on an Ubuntu Trusty (Linux
> 3.13.0-119-generic) box, running make check-kmod, I saw an issue with
> get_next_timer_interrupt():
>
> [181250.892557] BUG: unable to handle kernel paging request at 
> a03317e0 [181250.892557] IP: [] 
> get_next_timer_interrupt+0x86/0x250
> [181250.892557] PGD 1c11067 PUD 1c12063 PMD 1381a2067 PTE 0 
> [181250.892557] Oops:  [#1] SMP [181250.892557] Modules linked in: 
> nf_nat_ipv6 nf_nat_ipv4 nf_nat
> gre(-) nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6
> nf_defrag_ipv4 nf_conntrack_netlink nfnetlink nf_conntrack bonding 
> 8021q garp stp mrp llc veth nfsd auth_rpcgss nfs_acl nfs lockd sunrpc 
> fscache dm_crypt kvm_intel k

[ovs-dev] 答复: 答复: [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-06-26 Thread 王志克

Hi Joe,

I will try to check how to send the patch. Maybe tomorrow since I am quite busy 
now.

Regarding the crash, I can reproduce it even with official OVS, like ovs2.6.0. 
(I just run the check kmod in a loop until kernel panic). So it is not related 
to the new fix.

Br,
Wang Zhike

-邮件原件-
发件人: Joe Stringer [mailto:j...@ovn.org] 
发送时间: 2017年6月24日 5:15
收件人: 王志克
抄送: d...@openvswitch.org
主题: Re: 答复: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs 
reassemble

Hi Wang Zhike,

I'd like if others like Greg could take a look as well, since this code is 
delicate. The more review it gets, the better. It seems like maybe the version 
of your email that goes to the list does not get the attachment. Perhaps you 
could try sending the patch using git send-email or putting the patch on GitHub 
instead, and linking to it here.

For what it's worth, I did run your patch for a while and it seemed OK, but 
when I tried again today on an Ubuntu Trusty (Linux
3.13.0-119-generic) box, running make check-kmod, I saw an issue with
get_next_timer_interrupt():

[181250.892557] BUG: unable to handle kernel paging request at a03317e0 
[181250.892557] IP: [] get_next_timer_interrupt+0x86/0x250
[181250.892557] PGD 1c11067 PUD 1c12063 PMD 1381a2067 PTE 0 [181250.892557] 
Oops:  [#1] SMP [181250.892557] Modules linked in: nf_nat_ipv6 nf_nat_ipv4 
nf_nat
gre(-) nf_conntrack_ipv6 nf_conntrack_ipv4 nf_defrag_ipv6
nf_defrag_ipv4 nf_conntrack_netlink nfnetlink nf_conntrack bonding 8021q garp 
stp mrp llc veth nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_crypt 
kvm_intel kvm serio_raw netconsole configfs crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd psmouse floppy ahci 
libahci [last unloaded: libcrc32c]
[181250.892557] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   OX
3.13.0-119-generic #166-Ubuntu
[181250.892557] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Bochs 01/01/2011 [181250.892557] task: 81c15480 ti: 81c0 
task.ti:
81c0
[181250.892557] RIP: 0010:[]  []
get_next_timer_interrupt+0x86/0x250
[181250.892557] RSP: 0018:81c01e00  EFLAGS: 00010002 [181250.892557] 
RAX: a03317c8 RBX: 000102b245da RCX:
00db
[181250.892557] RDX: 81ebac58 RSI: 00db RDI:
000102b245db
[181250.892557] RBP: 81c01e48 R08: 00c88c1c R09:

[181250.892557] R10:  R11:  R12:
000142b245d9
[181250.892557] R13: 81eb9e80 R14: 000102b245da R15:
00cd63e8
[181250.892557] FS:  () GS:88013fc0()
knlGS:
[181250.892557] CS:  0010 DS:  ES:  CR0: 8005003b 
[181250.892557] CR2: a03317e0 CR3: 3707f000 CR4:
06f0
[181250.892557] Stack:
[181250.892557]   81c01e30 810a3af5
88013fc13bc0
[181250.892557]  88013fc0dce0 000102b245da 
0063ae154000
[181250.892557]  00cd63e8 81c01ea8 810da655
a4d8c2cb6200
[181250.892557] Call Trace:
[181250.892557]  [] ? set_next_entity+0x95/0xb0 
[181250.892557]  [] tick_nohz_stop_sched_tick+0x1e5/0x340
[181250.892557]  [] __tick_nohz_idle_enter+0xa1/0x160 
[181250.892557]  [] tick_nohz_idle_enter+0x3d/0x70 
[181250.892557]  [] cpu_startup_entry+0x87/0x2b0 
[181250.892557]  [] rest_init+0x77/0x80 [181250.892557]  
[] start_kernel+0x432/0x43d [181250.892557]  
[] ? repair_env_string+0x5c/0x5c [181250.892557]  
[] ? early_idt_handler_array+0x120/0x120
[181250.892557]  [] x86_64_start_reservations+0x2a/0x2c
[181250.892557]  [] x86_64_start_kernel+0x143/0x152 
[181250.892557] Code: 8b 7d 10 4d 8b 75 18 4c 39 f7 78 5c 40 0f b6 cf
89 ce 48 63 c6 48 c1 e0 04 49 8d 54 05 00 48 8b 42 28 48 83 c2 28 48
39 d0 74 0e  40 18 01 74 24 48 8b 00 48 39 d0 75 f2 83 c6 01 40 0f
b6 f6
[181250.892557] RIP  [] get_next_timer_interrupt+0x86/0x250
[181250.892557]  RSP 
[181250.892557] CR2: a03317e0

It seems like perhaps a fragment timer signed up by OVS is still remaining when 
the OVS module is unloaded, so it may attempt to clean up an entry using OVS 
code but the OVS code has been unloaded at that point. This might be related to 
IPv6 cvlan test - that seems to be where my VM froze and went to 100% CPU, but 
I would think that the
IPv6 fragmentation cleanup test is a more likely to cause this, since it leaves 
fragments behind in the cache after the test finishes. I've only hit this when 
running all of the tests in make check-kmod.

Cheers,
Joe

On 22 June 2017 at 17:53, 王志克 <wangzh...@jd.com> wrote:
> Hi Joe,
>
> Please check the attachment. Thanks.
>
> Br,
> Wang Zhike
>
> -邮件原件-
> 发件人: Joe Stringer [mailto:j...@ovn.org]
> 发送时间: 2017年6月23日 8:20
> 收件人: 王志克
> 抄送: d...@openvswitch.org
> 主题: Re: [ovs-dev] [PATCH] pkt reassemble: fix kern

[ovs-dev] 答复: 答复: [ovs-discuss] rpmbuild failure for ovs_dpdk

2017-06-22 Thread 王志克

Thanks Darrell.

I just want to confirm whether static link of dpdk library is supported or not 
for rpm build. From below discuss, it seems only dynamic link is supported.

https://mail.openvswitch.org/pipermail/ovs-dev/2015-November/306324.html

@Flavio,
  Appreciate your idea.

Br,
Wang Zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月22日 22:08
收件人: 王志克; ovs-disc...@openvswitch.org; ovs-dev@openvswitch.org
主题: Re: 答复: [ovs-discuss] rpmbuild failure for ovs_dpdk

This may not be the best list to help here.
Maybe a Centos mailing list, possibly or one of the gnu mailing lists.

Some information that may be useful for your email to one of these lists:

What version of glibc are you using
rpm -q glibc
in your case

What version of libtool
libtool --version

What version of gcc
gcc –version

Darrell


On 6/22/17, 1:06 AM, "王志克" <wangzh...@jd.com> wrote:

Hi,

I saw below output when executing the rpmbuild. I guess it is the root 
cause, since it refuses to link static lib.

checking if gcc -std=gnu99 static flag -static works... no


I tested that if I compile the dpdk to shared library, then the ovs 
rpmbuild will succeed.

However, even if I installed the glibc-static, 
 1) rpmbuild still reports not support -static. It rports " 
checking if gcc -std=gnu99 static flag -static works... no"
 2)standalone run ./configure --with-dpdk=xxx, it reports " 
checking if gcc -std=gnu99 static flag -static works... yes ". Note that 
previously I did not install the glibc-static, standalone ./configure still can 
succeed though reports "static ...no".

So question: 
why the rpmbuild cannot correct detect the -static flag for gcc?

Br,
Wang Zhike
-邮件原件-
发件人: 王志克 
发送时间: 2017年6月22日 9:31
收件人: 'Darrell Ball'; 'ovs-disc...@openvswitch.org'; ovs-dev@openvswitch.org
主题: 答复: [ovs-discuss] rpmbuild failure for ovs_dpdk

Hi Darrell,

I checked my config.status, and it was correct.

./config.status:S["OVS_LDFLAGS"]=" 
-L/usr/share/dpdk/x86_64-default-linuxapp-gcc/lib"

Any more idea about the failure? Did someone successfully build it? Thanks

BTW, I am using Centos 7.2
Wang Zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月21日 0:14
收件人: 王志克; ovs-dev@openvswitch.org; disc...@openvswitch.org
主题: Re: [ovs-discuss] [ovs-dev] rpmbuild failure for ovs_dpdk

Correction: ovs-disc...@openvswitch.org

On 6/20/17, 9:01 AM, "ovs-discuss-boun...@openvswitch.org on behalf of 
Darrell Ball" <ovs-discuss-boun...@openvswitch.org on behalf of 
db...@vmware.com> wrote:

Again, send to disc...@openvswitch.org



Do you see something like this ?



darrell@prmh-nsx-perf-server125:~/ovs/ovs_master$ grep -nr LDFLAGS 
_gcc/config.status 

846:S["OVS_LDFLAGS"]=" 
-L/usr/src/dpdk-16.11/x86_64-native-linuxapp-gcc/lib"



with dpdk-16.07 instead of dpdk-16.11

.

.

.
    


On 6/20/17, 5:43 AM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:



Hi All,



I try to build rpm for ovs+dpdk, but met below compiling issue. 
Does someone know how to fix it? I guess it is related to LDFLAGS='-Wl,-z,relro 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld, but no idea how to fix it.



If I follow below guide (non-rpm), everything is OK.


https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=U_c6DG0MU2EKiNSi8V100nCZs5fqmWt3JWLXzyMTsCE=3m7uHcguxmbP_bpOW5kMPStCjz2C4mE5DPJSy1wYINY=
 



I use dpdk-16.07 and ovs 2.6 without any change.



[root@A01-R06-I187-15 openvswitch-2.6.0]# rpmbuild -bb --without 
check --with dpdk rhel/openvswitch-fedora.spec

Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.PBk26t

+ umask 022

+ cd /root/rpmbuild/BUILD

+ cd /root/rpmbuild/BUILD

+ rm -rf openvswitch-2.6.0

+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/openvswitch-2.6.0.tar.gz

+ /usr/bin/tar -xf -

+ STATUS=0

+ '[' 0 -ne 0 ']'

+ cd openvswitch-2.6.0

+ /usr/bin/chmod -Rf a+rX,

[ovs-dev] 答复: [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-06-22 Thread 王志克

Hi Joe,

Please check the attachment. Thanks.

Br,
Wang Zhike

-邮件原件-
发件人: Joe Stringer [mailto:j...@ovn.org] 
发送时间: 2017年6月23日 8:20
收件人: 王志克
抄送: d...@openvswitch.org
主题: Re: [ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

On 21 June 2017 at 18:54, 王志克 <wangzh...@jd.com> wrote:
> Ovs and kernel stack would add frag_queue to same netns_frags list.
> As result, ovs and kernel may access the fraq_queue without correct 
> lock. Also the struct ipq may be different on kernel(older than 4.3), 
> which leads to invalid pointer access.
>
> The fix creates specific netns_frags for ovs.
>
> Signed-off-by: wangzhike <wangzh...@jd.com>
> ---

Hi,

It looks like the whitespace has been corrupted in this version of the patch 
that you sent, I cannot apply it. Probably your email client mistreats it when 
sending the email out. A reliable method to send patches correctly via email is 
to use the commandline client 'git send-email'. This is the preferred method. 
If you are unable to set that up, consider attaching the patch to the email (or 
send a pull request on GitHub).

Cheers,
Joe
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] 答复: [ovs-discuss] rpmbuild failure for ovs_dpdk

2017-06-22 Thread 王志克

Hi,

I saw below output when executing the rpmbuild. I guess it is the root cause, 
since it refuses to link static lib.

checking if gcc -std=gnu99 static flag -static works... no


I tested that if I compile the dpdk to shared library, then the ovs rpmbuild 
will succeed.

However, even if I installed the glibc-static, 
 1) rpmbuild still reports not support -static. It rports " checking if 
gcc -std=gnu99 static flag -static works... no"
 2)standalone run ./configure --with-dpdk=xxx, it reports " checking if 
gcc -std=gnu99 static flag -static works... yes ". Note that previously I did 
not install the glibc-static, standalone ./configure still can succeed though 
reports "static ...no".

So question: 
why the rpmbuild cannot correct detect the -static flag for gcc?

Br,
Wang Zhike
-邮件原件-
发件人: 王志克 
发送时间: 2017年6月22日 9:31
收件人: 'Darrell Ball'; 'ovs-disc...@openvswitch.org'; ovs-dev@openvswitch.org
主题: 答复: [ovs-discuss] rpmbuild failure for ovs_dpdk

Hi Darrell,

I checked my config.status, and it was correct.

./config.status:S["OVS_LDFLAGS"]=" 
-L/usr/share/dpdk/x86_64-default-linuxapp-gcc/lib"

Any more idea about the failure? Did someone successfully build it? Thanks

BTW, I am using Centos 7.2
Wang Zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月21日 0:14
收件人: 王志克; ovs-dev@openvswitch.org; disc...@openvswitch.org
主题: Re: [ovs-discuss] [ovs-dev] rpmbuild failure for ovs_dpdk

Correction: ovs-disc...@openvswitch.org

On 6/20/17, 9:01 AM, "ovs-discuss-boun...@openvswitch.org on behalf of Darrell 
Ball" <ovs-discuss-boun...@openvswitch.org on behalf of db...@vmware.com> wrote:

Again, send to disc...@openvswitch.org



Do you see something like this ?



darrell@prmh-nsx-perf-server125:~/ovs/ovs_master$ grep -nr LDFLAGS 
_gcc/config.status 

846:S["OVS_LDFLAGS"]=" -L/usr/src/dpdk-16.11/x86_64-native-linuxapp-gcc/lib"



with dpdk-16.07 instead of dpdk-16.11

.

.

.
    


On 6/20/17, 5:43 AM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:



Hi All,



I try to build rpm for ovs+dpdk, but met below compiling issue. Does 
someone know how to fix it? I guess it is related to LDFLAGS='-Wl,-z,relro 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld, but no idea how to fix it.



If I follow below guide (non-rpm), everything is OK.


https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=U_c6DG0MU2EKiNSi8V100nCZs5fqmWt3JWLXzyMTsCE=3m7uHcguxmbP_bpOW5kMPStCjz2C4mE5DPJSy1wYINY=
 



I use dpdk-16.07 and ovs 2.6 without any change.



[root@A01-R06-I187-15 openvswitch-2.6.0]# rpmbuild -bb --without check 
--with dpdk rhel/openvswitch-fedora.spec

Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.PBk26t

+ umask 022

+ cd /root/rpmbuild/BUILD

+ cd /root/rpmbuild/BUILD

+ rm -rf openvswitch-2.6.0

+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/openvswitch-2.6.0.tar.gz

+ /usr/bin/tar -xf -

+ STATUS=0

+ '[' 0 -ne 0 ']'

+ cd openvswitch-2.6.0

+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .

+ exit 0

Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.ZxDsjh

+ umask 022

+ cd /root/rpmbuild/BUILD

+ cd openvswitch-2.6.0

+ CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic'

+ export CFLAGS

+ CXXFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic'

+ export CXXFLAGS

+ FFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic 
-I/usr/lib64/gfortran/modules'

+ export FFLAGS

+ FCFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic 
-I/usr/lib64/gfortran/modules'

+ export FCFLAGS

+ LDFLAGS='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld'

+ export LDFLA

[ovs-dev] [PATCH] pkt reassemble: fix kernel panic for ovs reassemble

2017-06-21 Thread 王志克

Ovs and kernel stack would add frag_queue to same netns_frags list.
As result, ovs and kernel may access the fraq_queue without correct
lock. Also the struct ipq may be different on kernel(older than 4.3),
which leads to invalid pointer access.

The fix creates specific netns_frags for ovs.

Signed-off-by: wangzhike 
---
datapath/datapath.c| 22 +++---
datapath/datapath.h|  6 ++
datapath/linux/compat/include/net/inet_frag.h  | 18 -
datapath/linux/compat/include/net/ip.h |  4 ++
.../include/net/netfilter/ipv6/nf_defrag_ipv6.h|  4 ++
datapath/linux/compat/inet_fragment.c  | 83 --
datapath/linux/compat/ip_fragment.c| 66 ++---
datapath/linux/compat/nf_conntrack_reasm.c | 58 +--
8 files changed, 138 insertions(+), 123 deletions(-)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index c85029c..82cad74 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -2297,6 +2297,8 @@ static int __net_init ovs_init_net(struct net *net)
   INIT_LIST_HEAD(_net->dps);
   INIT_WORK(_net->dp_notify_work, ovs_dp_notify_wq);
   ovs_ct_init(net);
+   ovs_netns_frags_init(net);
+   ovs_netns_frags6_init(net);
   return 0;
}
@@ -2332,6 +2334,8 @@ static void __net_exit ovs_exit_net(struct net *dnet)
   struct net *net;
   LIST_HEAD(head);
+   ovs_netns_frags6_exit(dnet);
+   ovs_netns_frags_exit(dnet);
   ovs_ct_exit(dnet);
   ovs_lock();
   list_for_each_entry_safe(dp, dp_next, _net->dps, list_node)
@@ -2368,13 +2372,9 @@ static int __init dp_init(void)
pr_info("Open vSwitch switching datapath %s\n", VERSION);
-err = compat_init();
-if (err)
- goto error;
-
   err = action_fifos_init();
   if (err)
- goto error_compat_exit;
+goto error;
err = ovs_internal_dev_rtnl_link_register();
   if (err)
@@ -2392,10 +2392,14 @@ static int __init dp_init(void)
   if (err)
goto error_vport_exit;
-err = register_netdevice_notifier(_dp_device_notifier);
+   err = compat_init();
   if (err)
goto error_netns_exit;
+   err = register_netdevice_notifier(_dp_device_notifier);
+   if (err)
+goto error_compat_exit;
+
   err = ovs_netdev_init();
   if (err)
goto error_unreg_notifier;
@@ -2410,6 +2414,8 @@ error_unreg_netdev:
   ovs_netdev_exit();
error_unreg_notifier:
   unregister_netdevice_notifier(_dp_device_notifier);
+error_compat_exit:
+   compat_exit();
error_netns_exit:
   unregister_pernet_device(_net_ops);
error_vport_exit:
@@ -2420,8 +2426,6 @@ error_unreg_rtnl_link:
   ovs_internal_dev_rtnl_link_unregister();
error_action_fifos_exit:
   action_fifos_exit();
-error_compat_exit:
-compat_exit();
error:
   return err;
}
@@ -2431,13 +2435,13 @@ static void dp_cleanup(void)
   dp_unregister_genl(ARRAY_SIZE(dp_genl_families));
   ovs_netdev_exit();
   unregister_netdevice_notifier(_dp_device_notifier);
+   compat_exit();
   unregister_pernet_device(_net_ops);
   rcu_barrier();
   ovs_vport_exit();
   ovs_flow_exit();
   ovs_internal_dev_rtnl_link_unregister();
   action_fifos_exit();
-compat_exit();
}
 module_init(dp_init);
diff --git a/datapath/datapath.h b/datapath/datapath.h
index b835ada..8849625 100644
--- a/datapath/datapath.h
+++ b/datapath/datapath.h
@@ -141,6 +141,12 @@ struct ovs_net {
/* Module reference for configuring conntrack. */
   bool xt_label;
+
+#ifdef HAVE_INET_FRAG_LRU_MOVE
+   struct net *net;
+   struct netns_frags ipv4_frags;
+   struct netns_frags nf_frags;
+#endif
};
 extern unsigned int ovs_net_id;
diff --git a/datapath/linux/compat/include/net/inet_frag.h 
b/datapath/linux/compat/include/net/inet_frag.h
index 01d79ad..34078c8 100644
--- a/datapath/linux/compat/include/net/inet_frag.h
+++ b/datapath/linux/compat/include/net/inet_frag.h
@@ -52,22 +52,4 @@ static inline int rpl_inet_frags_init(struct inet_frags 
*frags)
#define inet_frags_init rpl_inet_frags_init
#endif
-#ifndef HAVE_CORRECT_MRU_HANDLING
-/* We reuse the upstream inet_fragment.c common code for managing fragment
- * stores, However we actually store the fragments within our own 'inet_frags'
- * structures (in {ip_fragment,nf_conntrack_reasm}.c). When unloading the OVS
- * kernel module, we need to flush all of the remaining fragments from these
- * caches, or else we will panic with the following sequence of events:
- *
- * 1) A fragment for a packet arrives and is cached in inet_frags. This
- *starts a timer to ensure the fragment does not hang around forever.
- * 2) openvswitch module is unloaded.
- * 3) The timer for the fragment fires, calling into backported OVS code
- *to free the fragment.
- * 4) BUG:

[ovs-dev] 答复: [ovs-discuss] rpmbuild failure for ovs_dpdk

2017-06-21 Thread 王志克

Hi Darrell,

I checked my config.status, and it was correct.

./config.status:S["OVS_LDFLAGS"]=" 
-L/usr/share/dpdk/x86_64-default-linuxapp-gcc/lib"

Any more idea about the failure? Did someone successfully build it? Thanks

BTW, I am using Centos 7.2
Wang Zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月21日 0:14
收件人: 王志克; ovs-dev@openvswitch.org; disc...@openvswitch.org
主题: Re: [ovs-discuss] [ovs-dev] rpmbuild failure for ovs_dpdk

Correction: ovs-disc...@openvswitch.org

On 6/20/17, 9:01 AM, "ovs-discuss-boun...@openvswitch.org on behalf of Darrell 
Ball" <ovs-discuss-boun...@openvswitch.org on behalf of db...@vmware.com> wrote:

Again, send to disc...@openvswitch.org



Do you see something like this ?



darrell@prmh-nsx-perf-server125:~/ovs/ovs_master$ grep -nr LDFLAGS 
_gcc/config.status 

846:S["OVS_LDFLAGS"]=" -L/usr/src/dpdk-16.11/x86_64-native-linuxapp-gcc/lib"



with dpdk-16.07 instead of dpdk-16.11

.

.

.



On 6/20/17, 5:43 AM, "ovs-dev-boun...@openvswitch.org on behalf of 王志克" 
<ovs-dev-boun...@openvswitch.org on behalf of wangzh...@jd.com> wrote:



Hi All,



I try to build rpm for ovs+dpdk, but met below compiling issue. Does 
someone know how to fix it? I guess it is related to LDFLAGS='-Wl,-z,relro 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld, but no idea how to fix it.



If I follow below guide (non-rpm), everything is OK.


https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_=DwICAg=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=U_c6DG0MU2EKiNSi8V100nCZs5fqmWt3JWLXzyMTsCE=3m7uHcguxmbP_bpOW5kMPStCjz2C4mE5DPJSy1wYINY=
 



I use dpdk-16.07 and ovs 2.6 without any change.



[root@A01-R06-I187-15 openvswitch-2.6.0]# rpmbuild -bb --without check 
--with dpdk rhel/openvswitch-fedora.spec

Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.PBk26t

+ umask 022

+ cd /root/rpmbuild/BUILD

+ cd /root/rpmbuild/BUILD

+ rm -rf openvswitch-2.6.0

+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/openvswitch-2.6.0.tar.gz

+ /usr/bin/tar -xf -

+ STATUS=0

+ '[' 0 -ne 0 ']'

+ cd openvswitch-2.6.0

+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .

+ exit 0

Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.ZxDsjh

+ umask 022

+ cd /root/rpmbuild/BUILD

+ cd openvswitch-2.6.0

+ CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic'

+ export CFLAGS

+ CXXFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic'

+ export CXXFLAGS

+ FFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic 
-I/usr/lib64/gfortran/modules'

+ export FFLAGS

+ FCFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic 
-I/usr/lib64/gfortran/modules'

+ export FCFLAGS

+ LDFLAGS='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld'

+ export LDFLAGS

+ '[' 1 == 1 ']'

+ '[' x86_64 == ppc64le ']'

++ find . -name config.guess -o -name config.sub

+ for i in '$(find . -name config.guess -o -name config.sub)'

++ basename ./build-aux/config.guess

+ '[' -f /usr/lib/rpm/redhat/config.guess ']'

+ /usr/bin/rm -f ./build-aux/config.guess

++ basename ./build-aux/config.guess

+ /usr/bin/cp -fv /usr/lib/rpm/redhat/config.guess 
./build-aux/config.guess

'/usr/lib/rpm/redhat/config.guess' -> './build-aux/config.guess'

+ for i in '$(find . -name config.guess -o -name config.sub)'

++ basename ./build-aux/config.sub

+ '[' -f /usr/lib/rpm/redhat/config.sub ']'

+ /usr/bin/rm -f ./build-aux/config.sub

++ basename ./build-aux/config.sub

+ /usr/bin/cp -fv /usr/lib/rpm/redhat/config.sub ./build-aux/config.sub

'/usr/lib/rpm/redhat/config.sub' -> './build-aux/c

[ovs-dev] rpmbuild failure for ovs_dpdk

2017-06-20 Thread 王志克

Hi All,

I try to build rpm for ovs+dpdk, but met below compiling issue. Does someone 
know how to fix it? I guess it is related to LDFLAGS='-Wl,-z,relro 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld, but no idea how to fix it.

If I follow below guide (non-rpm), everything is OK.
http://docs.openvswitch.org/en/latest/intro/install/dpdk/

I use dpdk-16.07 and ovs 2.6 without any change.

[root@A01-R06-I187-15 openvswitch-2.6.0]# rpmbuild -bb --without check --with 
dpdk rhel/openvswitch-fedora.spec
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.PBk26t
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd /root/rpmbuild/BUILD
+ rm -rf openvswitch-2.6.0
+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/openvswitch-2.6.0.tar.gz
+ /usr/bin/tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd openvswitch-2.6.0
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.ZxDsjh
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd openvswitch-2.6.0
+ CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic'
+ export CFLAGS
+ CXXFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic'
+ export CXXFLAGS
+ FFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic 
-I/usr/lib64/gfortran/modules'
+ export FFLAGS
+ FCFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
-fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1  -m64 -mtune=generic 
-I/usr/lib64/gfortran/modules'
+ export FCFLAGS
+ LDFLAGS='-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld'
+ export LDFLAGS
+ '[' 1 == 1 ']'
+ '[' x86_64 == ppc64le ']'
++ find . -name config.guess -o -name config.sub
+ for i in '$(find . -name config.guess -o -name config.sub)'
++ basename ./build-aux/config.guess
+ '[' -f /usr/lib/rpm/redhat/config.guess ']'
+ /usr/bin/rm -f ./build-aux/config.guess
++ basename ./build-aux/config.guess
+ /usr/bin/cp -fv /usr/lib/rpm/redhat/config.guess ./build-aux/config.guess
'/usr/lib/rpm/redhat/config.guess' -> './build-aux/config.guess'
+ for i in '$(find . -name config.guess -o -name config.sub)'
++ basename ./build-aux/config.sub
+ '[' -f /usr/lib/rpm/redhat/config.sub ']'
+ /usr/bin/rm -f ./build-aux/config.sub
++ basename ./build-aux/config.sub
+ /usr/bin/cp -fv /usr/lib/rpm/redhat/config.sub ./build-aux/config.sub
'/usr/lib/rpm/redhat/config.sub' -> './build-aux/config.sub'
++ dirname /usr/share/dpdk/x86_64-default-linuxapp-gcc/.config
+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu 
--program-prefix= --disable-dependency-tracking --prefix=/usr 
--exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc 
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 
--libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib 
--mandir=/usr/share/man --infodir=/usr/share/info --enable-libcapng 
--with-dpdk=/usr/share/dpdk/x86_64-default-linuxapp-gcc --enable-ssl 
--with-pkidir=/var/lib/openvswitch/pki
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking how to create a pax tar archive... gnutar
checking for style of include used by make... GNU
checking for x86_64-redhat-linux-gnu-gcc... no
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... none
checking for gcc option to accept ISO C99... -std=gnu99
checking how to run the C preprocessor... gcc -std=gnu99 -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for fgrep... /usr/bin/grep -F
checking for egrep... /usr/bin/grep -E
checking for perl... /usr/bin/perl
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking minix/config.h

[ovs-dev] How to efficiently connect docker network to ovs+dpdk switch

2017-06-15 Thread 王志克

Hi All,

Previously I use kernel ovs, and docker veth-pair port can be added to ovs 
bridge directly. In this case, docker traffic from kernel will direct to ovs 
kernel module.

Now I want to use ovs+dpdk to speed up the forwarding performance, but I am 
wondering how docker traffic would go to ovs userspace bridge. From my 
understanding, veth-pair traffic would
always go from kernel, and it needs to be copied to userspace, then bridged by 
ovs+dpdk. The performance is quite low.

So question:
1) What is the proposed docker network port for ovs+dpdk? Ideally kernel should 
NOT be involved. I am not sure whether it is possible.
2) currently veth-pair port can only be handled by main thread 
(NON_PMD_CORE_ID) since its is_pmd attributes is false. This means it can only 
be handled by 1 CPU. How can multiple CPU handle such case?

Br,
Wang Zhike

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] 答复: 答复: 答复: 答复: 答复: Query for missing function

2017-05-31 Thread 王志克

Hi Darrell,

Yes, it is hard to choose a proper value suitable for every network/situation. 
The network may even change from time to time, so indeed user need be involved 
for tuning.

So your idea is acceptable.

Br,
Wangzhike


-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月1日 11:42
收件人: 王志克; Ben Pfaff; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: 答复: 答复: 答复: 答复: [ovs-dev] Query for missing function



On 5/31/17, 8:06 PM, "王志克" <wangzh...@jd.com> wrote:

Hi Darrell,

In my opinion, it may be also hard for user to decide "configurable buffer 
size".


Yep, but the user knows their network or situation best.

 Also I guess the default value should be to enable the fragment support. ( If 
we give such option, I can imagine most user will enable it, right?)

I agree.
However, let us say we chose values for the memory thresholds for the user, 
like Linux.
The user thinks “the fragmentation thing is taken care of”. That is likely 
wrong, maybe they should be thinking more upfront.

My suggestion is to use Linux kernel as best practice.

That is the stock answer; no doubt.

Just my personal thought.

Br,
Wangzhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月1日 10:16
收件人: 王志克; Ben Pfaff; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: 答复: 答复: 答复: [ovs-dev] Query for missing function



    On 5/31/17, 6:07 PM, "王志克" <wangzh...@jd.com> wrote:

Hi,

See my reply in line. Thanks.

Br,
Wangzhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
    发送时间: 2017年6月1日 3:29
收件人: 王志克; Ben Pfaff; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: 答复: 答复: [ovs-dev] Query for missing function

    

On 5/26/17, 6:24 PM, "王志克" <wangzh...@jd.com> wrote:

Hi Darrell,

I indeed observed IP fragment scenario in our other product 
deployment, and resulted some critical issue. Then I am
 wondering how to handle it in OVS+DPDK alternative solution.

[Darrell]
I am not sure what critical means here; it varies widely based on 
several considerations.

Maybe you just describe what the situation was and what happened.
Let me ask a few questions to draw that out.

Q1) Was that “other deployment” using kernel datapath and experienced a 
transient input of fragmented traffic from a Vxlan tunnel that the kernel could 
not handle due to lower throughput for fragmented traffic and/or kernel 
fragmentation thresholds used and ended up dropping those packets (possibly 
retried with delay) ?
Or something else ?
[Wangzhike] I observed Linux kernel panic when handling IP fragmented 
packets both from OVS and general IP stack, like Vxlan tunnel packets and the 
overlay packets are both fragmented. I am preparing patch for that.



Q2) Was the transient input of fragmented packets an intentional hack 
used to trash other traffic or just network misconfiguration or you have no 
clue ?
[Wangzhike] I believe it is kind of attack. Lots of uncompleted IP 
fragments were received.

Yes, fragmentation is good at driving forwarding performance down and/or 
increasing attack surface.
If one has control over the network boundaries, it can be avoided within 
those boundaries.
In other scenarios, the fragments get into a control volume because of 
others fault, intentional or otherwise.
If the fragments are legitimate and in large numbers for long enough, the 
misconfiguration must be fixed at source because they can’t be handled anyways 
in software.
Fragments in low numbers/less often provide less pressure to fix the 
misconfiguration, since it is what we can handle with software, assuming they 
are legitimate.

If OVS-DPDK adds support for IP fragmentation, it will be susceptible to 
the same issues the kernel datapath has.
I don’t know of a configurable, non-zero default fragmentation buffer that 
will work in all unforeseen legitimate cases.
Bigger fragmentation buffers are more susceptible to exploits.
A zero default configurable buffer size seems most clear and then let the 
user decide how much “IP Fragmentation”
they want.

Thoughts ? 



Typical cases are:
1) VxLan segmented packet reaches Vswitch and need to pop the VxLan 
header for further handling. In kernel OVS, normally Linux kernel will 
reassemble it before sending to OVS module. Since this happens in real world, 
we need to handle it though the possibility of happening is quite low.

[Darrell]
It is possible that fragmented packets arrive from a Vxlan tunnel – 
that is obvious.

[ovs-dev] 答复: 答复: 答复: 答复: Query for missing function

2017-05-31 Thread 王志克

Hi Darrell,

In my opinion, it may be also hard for user to decide "configurable buffer 
size". Also I guess the default value should be to enable the fragment support. 
( If we give such option, I can imagine most user will enable it, right?)

My suggestion is to use Linux kernel as best practice.

Just my personal thought.

Br,
Wangzhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月1日 10:16
收件人: 王志克; Ben Pfaff; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: 答复: 答复: 答复: [ovs-dev] Query for missing function



On 5/31/17, 6:07 PM, "王志克" <wangzh...@jd.com> wrote:

Hi,

See my reply in line. Thanks.

Br,
Wangzhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年6月1日 3:29
收件人: 王志克; Ben Pfaff; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: 答复: 答复: [ovs-dev] Query for missing function



    On 5/26/17, 6:24 PM, "王志克" <wangzh...@jd.com> wrote:

Hi Darrell,

I indeed observed IP fragment scenario in our other product deployment, 
and resulted some critical issue. Then I am
 wondering how to handle it in OVS+DPDK alternative solution.

[Darrell]
I am not sure what critical means here; it varies widely based on several 
considerations.

Maybe you just describe what the situation was and what happened.
Let me ask a few questions to draw that out.

Q1) Was that “other deployment” using kernel datapath and experienced a 
transient input of fragmented traffic from a Vxlan tunnel that the kernel could 
not handle due to lower throughput for fragmented traffic and/or kernel 
fragmentation thresholds used and ended up dropping those packets (possibly 
retried with delay) ?
Or something else ?
[Wangzhike] I observed Linux kernel panic when handling IP fragmented 
packets both from OVS and general IP stack, like Vxlan tunnel packets and the 
overlay packets are both fragmented. I am preparing patch for that.



Q2) Was the transient input of fragmented packets an intentional hack used 
to trash other traffic or just network misconfiguration or you have no clue ?
[Wangzhike] I believe it is kind of attack. Lots of uncompleted IP 
fragments were received.

Yes, fragmentation is good at driving forwarding performance down and/or 
increasing attack surface.
If one has control over the network boundaries, it can be avoided within those 
boundaries.
In other scenarios, the fragments get into a control volume because of others 
fault, intentional or otherwise.
If the fragments are legitimate and in large numbers for long enough, the 
misconfiguration must be fixed at source because they can’t be handled anyways 
in software.
Fragments in low numbers/less often provide less pressure to fix the 
misconfiguration, since it is what we can handle with software, assuming they 
are legitimate.

If OVS-DPDK adds support for IP fragmentation, it will be susceptible to the 
same issues the kernel datapath has.
I don’t know of a configurable, non-zero default fragmentation buffer that will 
work in all unforeseen legitimate cases.
Bigger fragmentation buffers are more susceptible to exploits.
A zero default configurable buffer size seems most clear and then let the user 
decide how much “IP Fragmentation”
they want.

Thoughts ? 



Typical cases are:
1) VxLan segmented packet reaches Vswitch and need to pop the VxLan header 
for further handling. In kernel OVS, normally Linux kernel will reassemble it 
before sending to OVS module. Since this happens in real world, we need to 
handle it though the possibility of happening is quite low.

[Darrell]
It is possible that fragmented packets arrive from a Vxlan tunnel – that is 
obvious.


2) Segmented packets go through conntrack. In kernel OVS, it will 
reuse Kernel reassembly function to make reassembled packet go through 
conntrack.

[Darrell]
“2” is not a use case; it is simply a statement of what the kernel does 
when faced with fragmented packets, which is the topic of this thread.
[Wangzhike] Let me revise it. We set some rule on OVS DPDK conntrack, and 
one VM app sends large packet (eg 9600 size while MTU is 1500). In this case, 
fragmented packet will reach OVS conntrack. We hope such packet can be handled 
as kernel ovs behavior(able to be reassembled before really go through 
conntrack) instead of tagging as INV state.

FYI, there are several configuration parameters that determine the actual 
behavior obtained. The effective behavior could even be that the kernel drops 
all these fragments.




Above cases really happen in current product deployment, and we want to 
keep it work when migrating to OVS+DPDK solution.

Br,
Wang Zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@

[ovs-dev] 答复: 答复: Query for missing function

2017-05-26 Thread 王志克

Hi Darrell,

I indeed observed IP fragment scenario in our other product deployment, and
resulted some critical issue. Then I am wondering how to handle it in OVS+DPDK
alternative solution.

Typical cases are:
1) VxLan segmented packet reaches Vswitch and need to pop the VxLan header
for further handling. In kernel OVS, normally Linux kernel will reassemble it
before sending to OVS module. Since this happens in real world, we need to
handle it though the possibility of happening is quite low.
2) Segmented packets go through conntrack. In kernel OVS, it will reuse
Kernel reassembly function to make reassembled packet go through conntrack.

Above cases really happen in current product deployment, and we want to keep it
work when migrating to OVS+DPDK solution.

Br,
Wang Zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com]
发送时间: 2017年5月27日 2:45
收件人: 王志克; Ben Pfaff; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: 答复: [ovs-dev] Query for missing function

On 5/26/17, 2:00 AM, "王志克" <wangzh...@jd.com> wrote:

Hi Darrell, Ben,

Thanks for your reply. Glad to hear that we are approaching useful
candidate patch.

What is the plan for disassemble and fragment for OVS+DPDK? Like
1, received underlay vxlan fragmented packets,
2, received overlay fragmented packets that will go through conntrack
3, output packet with size > out_port_mtu

IP frag. is still on the radar:

I have a large dataset of information regarding IP frag usage from a widely
distributed virtualization product.

However, I would like know your usage requirements for IP frag ?

Do you want it for feature parity reasons with the kernel or does it solve some
problems you are facing; can you explain your specific needs ?

Br,
Wang zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com]
发送时间: 2017年5月26日 9:45
收件人: Ben Pfaff; 王志克; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: [ovs-dev] Query for missing function

On 5/25/17, 2:04 PM, "ovs-dev-boun...@openvswitch.org on behalf of Ben
Pfaff" <ovs-dev-boun...@openvswitch.org on behalf of b...@ovn.org> wrote:

On Wed, May 24, 2017 at 12:48:24PM +, 王志克 wrote:
> Reading the release note of DPDK section for OVS2.6, I note below:
>
> * Basic connection tracking for the userspace datapath (no ALG,
>fragmentation or NAT support yet)
>
> I am wondering for the missing part (no ALG, fragmentation, NAT), can
> I have the release plan for such feature? Or is there draft version
> for trial?

I think that Darrell (CCed) is working on that for OVS 2.8. He has
posted patches before. I expect to see a revision of it pretty soon.

NAT patches have been out for a while and a minor reversion will come out
next week along with a separate series for ftp alg support.
The NAT patches have been tested by a couple other folks, externally and
internally, as well.

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=ZRe75F1jCPHXw_hLBQYBvV3rHd7_FN64hTeQsi0j3Xo=x1M4K22nb1JiegFLUNqxxap71zeRqVZbTdeWk86BPD4=

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] 答复: Query for missing function

2017-05-26 Thread 王志克

Hi Darrell, Ben,

Thanks for your reply. Glad to hear that we are approaching useful candidate 
patch.

What is the plan for disassemble and fragment for OVS+DPDK? Like
   1, received underlay vxlan fragmented packets,
   2, received overlay fragmented packets that will go through conntrack
   3, output packet with size > out_port_mtu

Br,
Wang zhike

-邮件原件-
发件人: Darrell Ball [mailto:db...@vmware.com] 
发送时间: 2017年5月26日 9:45
收件人: Ben Pfaff; 王志克; Darrell Ball
抄送: ovs-dev@openvswitch.org
主题: Re: [ovs-dev] Query for missing function

On 5/25/17, 2:04 PM, "ovs-dev-boun...@openvswitch.org on behalf of Ben Pfaff" 
<ovs-dev-boun...@openvswitch.org on behalf of b...@ovn.org> wrote:

On Wed, May 24, 2017 at 12:48:24PM +, 王志克 wrote:
> Reading the release note of DPDK section for OVS2.6, I note below:
> 
>  * Basic connection tracking for the userspace datapath (no ALG,
>fragmentation or NAT support yet)
> 
> I am wondering for the missing part (no ALG, fragmentation, NAT), can
> I have the release plan for such feature? Or is there draft version
> for trial?

I think that Darrell (CCed) is working on that for OVS 2.8.  He has
posted patches before.  I expect to see a revision of it pretty soon.

NAT patches have been out for a while and a minor reversion will come out next 
week along with a separate series for ftp alg support.
The NAT patches have been tested by a couple other folks, externally and 
internally, as well.

___
dev mailing list
d...@openvswitch.org

https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev=DwIGaQ=uilaK90D4TOVoH58JNXRgQ=BVhFA09CGX7JQ5Ih-uZnsw=ZRe75F1jCPHXw_hLBQYBvV3rHd7_FN64hTeQsi0j3Xo=x1M4K22nb1JiegFLUNqxxap71zeRqVZbTdeWk86BPD4=

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] Query for missing function

2017-05-24 Thread 王志克

Hi All,

Reading the release note of DPDK section for OVS2.6, I note below:

 * Basic connection tracking for the userspace datapath (no ALG,
   fragmentation or NAT support yet)

I am wondering for the missing part (no ALG, fragmentation, NAT), can I have 
the release plan for such feature? Or is there draft version for trial?

In addition, if ovs+dpdk receives tunnel packets, eg vxlan packets terminated 
in ovs eth0 and needs be further forwarded to VM after pop the vxlan header. 
For such packets, if they also have IP header fragment flag set, like MF, 
offset, such packets should be reassembled before pop vxlan header. Does 
ovs+dpdk support the reassembly function? Thanks.

Best regards,
Wang zhike

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

58 matches

Mail list logo