Re: [ovs-dev] dest mac in fast datapath does not act as expected

2022-08-18 Thread ychen




thanks! 
upgrade to ovs version 2.12.1 fix my problem.














At 2022-08-17 18:49:46, "Ilya Maximets"  wrote:
>On 8/17/22 11:32, ychen wrote:
>> hi,
>>when we send 2 packets with different dest mac in 10s(fast datapath flow 
>> aging time), with the same userspace flow action, but the second packet  act 
>> incorrectly.
>> 
>> 1. problem phenomenon: 
>> userspace flow:  
>> in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:fa:16:3e:c0:ee:8c->eth_dst,output:tap111
>> packet caputured:
>> 13:51:02.097914 fe:ff:ff:ff:ff:ff > fa:16:3e:c0:ee:8c ,  ethertype IPv4 
>> (0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A 
>> 66.102.251.24 (73)   //first packet, corrrect change its mac 
>> 13:51:04.213568 fe:ff:ff:ff:ff:ff > 00:00:00:00:00:00, ethertype IPv4 
>> (0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A 
>> 66.102.251.24 (73)//second packet, dest mac stay as unchanged
>> 
>> 
>>2. enviroment
>>ovs 2.12, can only reproduce in kernel mode
>> 
>> 
>>3. reproduce step
>>   client node--> server node
>> 
>> 
>>   3.1 client node configuration:
>>   $ sudo ovs-vsctl show
>> 77f97d1d-e34f-4e4c-b4f1-1d2299a4411a
>> Bridge br-test
>> fail_mode: secure
>> Port "vxlan11"
>> Interface "vxlan11"
>> type: vxlan
>> options: {in_key=flow, local_ip="10.185.2.87", out_key=flow, 
>> remote_ip=flow}
>> Port br-test
>> Interface br-test
>> type: internal
>> Port "tap11"
>> Interface "tap11"
>> type: internal
>> ovs_version: "2.12.0"
>>  
>> sudo ovs-ofctl dump-flows br-test -O openflow13
>>  cookie=0x0, duration=223377.958s, table=0, n_packets=7512, n_bytes=1755097, 
>> reset_counts in_port=tap11 
>> actions=set_field:0x3562->tun_id,load:0xab90251->NXM_NX_TUN_IPV4_DST[],output:vxlan11
>>  
>>   3.2  server node configuration:
>># ovs-vsctl show
>> f39fb127-019b-41c3-86b7-a420a3b4d7f2
>> Bridge br-int
>> fail_mode: secure
>> Port "vf-10.185.2.81"
>> Interface "vf-10.185.2.81"
>> type: vxlan
>> options: {csum="true", df_default="false", in_key=flow, 
>> local_ip="10.185.2.81", out_key=flow, remote_ip=flow}
>> Port br-int
>> Interface br-int
>> type: internal
>> Port "tap111"
>> Interface "tap111"
>> type: internal
>>   ovs_version: "2.12.0"
>>  
>> ovs-ofctl add-flow br-int -O openflow13 
>> "in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:c6:3a:16:ec:e0:d9->eth_dst,output:tap111"
>> 
>> 
>>   3.3 sending packets
>> packet payload: 
>>dst mac:c6:3a:16:ec:e0:d9
>>src mac: 02:00:00:00:00:00
>>src ip: 10.194.50.241
>>dst ip: 10.100.100.212 
>>proto: udp
>> l4port: 45678
>> # ovs-ofctl packet-out br-test 1 "table=0" 
>> "c63a16ece0d902000800451c400040118de60ac232f10a6464d48000b26e00082084"
>>sleep 1s, send the second packet:
>> # ovs-ofctl packet-out br-test 1 "table=0" 
>> "02000800451c400040118de60ac232f10a6464d48000b26e00082084"
>>   
>>   3.4 server node packet capture
>>  10:49:59.725865 fe:ff:ff:ff:ff:ff > c6:3a:16:ec:e0:d9, ethertype 
>> IPv4 (0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, 
>> length 0
>>  10:50:00.881564 fe:ff:ff:ff:ff:ff > 11:11:11:11:11:11, ethertype 
>> IPv4 (0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, 
>> length 0  //it is wrong, dest mac should be c6:3a:16:ec:e0:d9
>>  
>>   3.5  fast datapath flow in server node
>> 
>> recirc_id(0),tunnel(tun_id=0x3562,src=10.185.2.87,dst=10.185.2.81,flags(-df-csum+key)),in_port(1),eth(src=02:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
>>  packets:1, bytes:60, used:2.17

[ovs-dev] dest mac in fast datapath does not act as expected

2022-08-17 Thread ychen
hi,
   when we send 2 packets with different dest mac in 10s(fast datapath flow 
aging time), with the same userspace flow action, but the second packet  act 
incorrectly.

1. problem phenomenon: 
userspace flow:  
in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:fa:16:3e:c0:ee:8c->eth_dst,output:tap111
packet caputured:
13:51:02.097914 fe:ff:ff:ff:ff:ff > fa:16:3e:c0:ee:8c ,  ethertype IPv4 
(0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A 
66.102.251.24 (73)   //first packet, corrrect change its mac 
13:51:04.213568 fe:ff:ff:ff:ff:ff > 00:00:00:00:00:00, ethertype IPv4 
(0x0800), length 115: 10.194.50.240.53 > 10.199.16.44.48651: 35479 1/0/1 A 
66.102.251.24 (73)//second packet, dest mac stay as unchanged


   2. enviroment
   ovs 2.12, can only reproduce in kernel mode


   3. reproduce step
  client node--> server node


  3.1 client node configuration:
  $ sudo ovs-vsctl show
77f97d1d-e34f-4e4c-b4f1-1d2299a4411a
Bridge br-test
fail_mode: secure
Port "vxlan11"
Interface "vxlan11"
type: vxlan
options: {in_key=flow, local_ip="10.185.2.87", out_key=flow, 
remote_ip=flow}
Port br-test
Interface br-test
type: internal
Port "tap11"
Interface "tap11"
type: internal
ovs_version: "2.12.0"
 
sudo ovs-ofctl dump-flows br-test -O openflow13
 cookie=0x0, duration=223377.958s, table=0, n_packets=7512, n_bytes=1755097, 
reset_counts in_port=tap11 
actions=set_field:0x3562->tun_id,load:0xab90251->NXM_NX_TUN_IPV4_DST[],output:vxlan11
 
  3.2  server node configuration:
   # ovs-vsctl show
f39fb127-019b-41c3-86b7-a420a3b4d7f2
Bridge br-int
fail_mode: secure
Port "vf-10.185.2.81"
Interface "vf-10.185.2.81"
type: vxlan
options: {csum="true", df_default="false", in_key=flow, 
local_ip="10.185.2.81", out_key=flow, remote_ip=flow}
Port br-int
Interface br-int
type: internal
Port "tap111"
Interface "tap111"
type: internal
  ovs_version: "2.12.0"
 
ovs-ofctl add-flow br-int -O openflow13 
"in_port=1,table=0,cookie=0x123,priority=500,tun_id=0x3562,actions=set_field:fe:ff:ff:ff:ff:ff->eth_src,set_field:c6:3a:16:ec:e0:d9->eth_dst,output:tap111"


  3.3 sending packets
packet payload: 
   dst mac:c6:3a:16:ec:e0:d9
   src mac: 02:00:00:00:00:00
   src ip: 10.194.50.241
   dst ip: 10.100.100.212 
   proto: udp
l4port: 45678
# ovs-ofctl packet-out br-test 1 "table=0" 
"c63a16ece0d902000800451c400040118de60ac232f10a6464d48000b26e00082084"
   sleep 1s, send the second packet:
# ovs-ofctl packet-out br-test 1 "table=0" 
"02000800451c400040118de60ac232f10a6464d48000b26e00082084"
  
  3.4 server node packet capture
 10:49:59.725865 fe:ff:ff:ff:ff:ff > c6:3a:16:ec:e0:d9, ethertype IPv4 
(0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, length 0
 10:50:00.881564 fe:ff:ff:ff:ff:ff > 11:11:11:11:11:11, ethertype IPv4 
(0x0800), length 60: 10.194.50.241.32768 > 10.100.100.212.45678: UDP, length 0  
//it is wrong, dest mac should be c6:3a:16:ec:e0:d9
 
  3.5  fast datapath flow in server node

recirc_id(0),tunnel(tun_id=0x3562,src=10.185.2.87,dst=10.185.2.81,flags(-df-csum+key)),in_port(1),eth(src=02:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no),
 packets:1, bytes:60, used:2.176s, actions:,set(eth(src=fe:ff:ff:ff:ff:ff)),3   

correct datapath flow:

recirc_id(0),tunnel(tun_id=0x3562,src=10.185.2.87,dst=10.185.2.81,flags(-df-csum+key)),in_port(1),eth(src=02:00:00:00:00:00,dst=11:11:11:11:11:11),eth_type(0x0800),ipv4(frag=no),
 packets:0, bytes:0, used:never,
actions:set(eth(src=fe:ff:ff:ff:ff:ff,dst=c6:3a:16:ec:e0:d9)),3

compare with the correct datapath flow, dest mac is disappeared in match 
and action.


  
 


 
 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] meter stats cleared when modify meter bands

2021-07-28 Thread ychen
I know  that in latest ovs versions, both kernel datapath and dpdk userspace 
datapath supports meter action.
what I want to know is why we need to clear stats when modify meter bands? are 
there any considerations?
I think it is easily to keep meter stats when only modify meter bands.

















At 2021-07-28 13:46:21, "Tonghao Zhang"  wrote:
>On Wed, Jul 28, 2021 at 10:57 AM ychen  wrote:
>>
>> Hi, all:
>> I have a question why meter stats need  cleared when just modify meter 
>> bands?
>> when call function handle_modify_meter(), it finally call function 
>> dpif_netdev_meter_set(), in this function new dp_meter will be allocated and 
>> attched, hence stats will be cleared.
>>if we just update dp_meter bands configuration, so the stats will be 
>> keeped when meter modify.
>>Is there any consideration about this meter modify action?
>The commit 80738e5f93a70 supports the meter for kernel datapath.
>and even though kernel modules support to set stats, but userspace
>doesn't set them.
>https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=96fbc13d7e770b542d2d1fcf700d0baadc6e8063
>
>If needed, we can support this.
>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
>-- 
>Best regards, Tonghao
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] meter stats cleared when modify meter bands

2021-07-27 Thread ychen
Hi, all:
I have a question why meter stats need  cleared when just modify meter 
bands?  
when call function handle_modify_meter(), it finally call function 
dpif_netdev_meter_set(), in this function new dp_meter will be allocated and 
attched, hence stats will be cleared.
   if we just update dp_meter bands configuration, so the stats will be keeped 
when meter modify. 
   Is there any consideration about this meter modify action?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] dp_hash algorithm works incorretly when tcp retransmit

2021-02-03 Thread ychen
We meet a problem that same tcp session selects different ovs group bucket when 
in tcp retransmition, and we can easily reproduce this phenomenon.
After some code research, we found that when tcp retransmit, it may call 
function sk_rethink_txhash(), and this function makes skb->hash changed, hence 
different ovs group bucket selected.
anyone has good suggestions to fix this problem?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] same tcp session selects different ovs group bucket when tcp retransmit

2021-01-23 Thread ychen
Hi, all:
recently we meet a problem that when use ovs group with selection method 
dp_hash,  same tcp session selects different ovs group bucket when tcp packet 
retransmit.
if we fill different snat gw in group buckets, that will make tcp session 
reset after packet retranmition.


we can reproduce this problem in a simple enviroment:
Node1: (debian 9.8 with kernel version 4.9.65 and ovs version  2.10.1)  act 
as a httpserver
ovs-vsctl add-br br-int
ovs-vsctl set bridge br-int 
protocols="OpenFlow10","OpenFlow11","OpenFlow12","OpenFlow13","OpenFlow14","OpenFlow15"
ovs-vsctl add-port br-int tap111 -- set interface tap111 type=internal
ovs-vsctl add-port br-int vxlan111 -- set interface vxlan111 type=vxlan 
options:in_key=flow options:local_ip="10.185.2.46" options:out_key=flow 
options:remote_ip=flow
ip link set dev tap111 netns ns111
   ip netns exec ns111 ip link set dev tap111 up
   ip netns exec ns111 ip link set dev tap111 mtu 1450
   ip netns exec ns111 ip address add 10.1.1.1/24 dev tap111


  //only an emulation, just set different nw_ttl in different bucket, we can 
simply watch the problem by capture packets
   ovs-ofctl add-group br-int -O openflow15 \
"group_id=2233,type=select,selection_method=dp_hash,bucket=bucket_id=1,actions=mod_nw_ttl:10,output:vxlan111,bucket=bucket_id=2,actions=mod_nw_ttl:20,output:vxlan111"
 ovs-ofctl add-flow br-int -O openflow15 
"priority=100,in_port=tap111,ip,actions=set_field:1122->tun_id,set_field:10.185.2.47->tun_dst,group:2233"


 ovs-ofctl add-flow br-int -O openflow15 
"priority=100,in_port=tap111,arp,actions=set_field:1122->tun_id,set_field:10.185.2.47->tun_dst,output:vxlan111"
 
 ovs-ofctl add-flow br-int -O openflow15 
"priority=100,in_port=vxlan111,tun_id=1122,actions=output:tap111"


 //use tc netem to emulate tcp retransmition
  ip netns exec ns111 tc qdisc add dev tap111 root netem loss 1%




Node2: (debian 9.1 with kernel version 4.9.0 and ovs version  2.8.2)  act 
as a httpclient
  ovs-vsctl add-br br-int
ovs-vsctl set bridge br-int 
protocols="OpenFlow10","OpenFlow11","OpenFlow12","OpenFlow13","OpenFlow14","OpenFlow15"
ovs-vsctl add-port br-int tap111 -- set interface tap111 type=internal
ovs-vsctl add-port br-int vxlan111 -- set interface vxlan111 type=vxlan 
options:in_key=flow options:local_ip="10.185.2.47" options:out_key=flow 
options:remote_ip=flow
ip link set dev tap111 netns ns111
   ip netns exec ns111 ip link set dev tap111 up
   ip netns exec ns111 ip link set dev tap111 mtu 1450
   ip netns exec ns111 ip address add 10.1.1.8/24 dev tap111


   ovs-ofctl add-flow br-int -O openflow15 
"priority=100,in_port=tap111,actions=set_field:1122->tun_id,set_field:10.185.2.46->tun_dst,output:vxlan111"
 
   ovs-ofctl add-flow br-int -O openflow15 
"priority=100,in_port=vxlan111,tun_id=1122,actions=output:tap111"
In such enviroment, when we try to get a large file from Node1(httpserver), we 
can found that after tcp retransmition, not only the outer header of vxlan udp 
source port changed, but also inner IP header TTL changed.


I think maybe sk_rethink_txhash() makes skb->hash changed when tcp retransmit, 
and any function who calls skb_get_hash() will be affected, like execute_hash() 
and udp_flow_src_port().




   
   
 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] question about userspace flow stats with meter

2020-07-27 Thread ychen
hi, I want to know how datapath stats mapped to userspace flow stats? is there 
any documents?
example:
   table=0,in_port=1, meter=11,goto_table:2
   table=2,in_port=1,output:2
   meter: rate=1Mbps


   when I send packets with 2Mbps from port1, and totally 1 packets 
transmited
  first I expected that table=0 should with stats 1 packets, and table=2 
should only have 5000 packets(meter stats show that there are 5000 packets 
dropped)
  but actually both table=0 and table=2 with stats 1 packets.
   
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] same tcp session encapsulated with different udp src port in kernel mode if packet has do ip_forward

2019-11-08 Thread ychen
 ofp_port_t in_port;/* OpenFlow in port, or OFPP_NONE. */
 uint16_t mru;  /* If !0, Maximum receive unit of
   fragmented IP packet */
+uint32_t skb_hash;


 enum dpif_upcall_type type;/* Datapath type of the upcall. */
 const struct nlattr *userdata; /* Userdata for DPIF_UC_ACTION Upcalls. */
@@ -772,6 +773,7 @@ recv_upcalls(struct handler *handler)
 struct upcall *upcall = [n_upcalls];
 struct flow *flow = [n_upcalls];
 unsigned int mru;
+unsigned int skb_hash;
 int error;


 ofpbuf_use_stub(recv_buf, recv_stubs[n_upcalls],
@@ -792,6 +794,12 @@ recv_upcalls(struct handler *handler)
 mru = 0;
 }


+if (dupcall->skb_hash){
+skb_hash = nl_attr_get_u32(dupcall->skb_hash);
+} else {
+skb_hash = 0;
+}
+
 error = upcall_receive(upcall, udpif->backer, >packet,
dupcall->type, dupcall->userdata, flow, mru,
>ufid, PMD_ID_NULL);
@@ -816,7 +824,7 @@ recv_upcalls(struct handler *handler)


 upcall->out_tun_key = dupcall->out_tun_key;
 upcall->actions = dupcall->actions;
-
+upcall->skb_hash = skb_hash;
 pkt_metadata_from_flow(>packet.md, flow);
 flow_extract(>packet, flow);


@@ -1470,6 +1478,7 @@ handle_upcalls(struct udpif *udpif, struct upcall 
*upcalls,
 op->dop.u.execute.needs_help = (upcall->xout.slow & SLOW_ACTION) 
!= 0;
 op->dop.u.execute.probe = false;
 op->dop.u.execute.mtu = upcall->mru;
+    op->dop.u.execute.skb_hash = upcall->skb_hash;
 }
 }


--
2.1.4









At 2019-11-06 12:04:57, "Tonghao Zhang"  wrote:
>On Mon, Nov 4, 2019 at 7:44 PM ychen  wrote:
>>
>>
>>
>> we can easily reproduce this phenomenon by using tcp socket stream sending 
>> from ovs internal port.
>>
>>
>>
>>
>> At 2019-10-30 19:49:16, "ychen"  wrote:
>>
>> Hi,
>>when we use docker to establish tcp session, we found that the packet 
>> which must do upcall to userspace has different encapsulated udp source port
>>with packet that only needs do datapath flow forwarding.
>>
>>
>>After some code research and kprobe debug,  we found the following:
>>1.  use udp_flow_src_port() to get the port
>> so when both skb->l4_hash==0 and skb->sw_hash==0, 5 tuple data will 
>> be used to calculate the skb->hash
>> 2. when first packet of tcp session coming,  packet needs do upcall to 
>> userspace, and then ovs_packet_cmd_execute() called
>> new skb is allocated with both l4_hash and sw_hash set to 0
>> 3. when none first packet of tcp sesion coming, function 
>> ovs_dp_process_packet()->ovs_execute_actions() called,
>> and this time original skb is reserved.
>> when packet has do ip_forward(), kprobe debug prints skb->l4_hash=1, 
>> sw_hash=0
>> 4. we searched kernel code, and found such code:
>>  skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
>> {  if (sk->sk_txhash) {
>> skb->l4_hash = 1;
>> skb->hash = sk->sk_txhash;
>> }
>>}
>>   static inline void sk_set_txhash(struct sock *sk)
>>   {sk->sk_txhash = net_tx_rndhash();  ==>it is a random 
>> value!!}
>>5. so let's have a summary:
>>when packet is processing only in datapath flow, skb->hash is random 
>> value for the same tcp session?
>>when packet needs processing first to userspace, than kernel space, 
>> skb->hash is calculated by 5 tuple?
>>
>>Our testing enviroment:
>>debian 9, kernel 4.9.65
>>ovs version: 2.8.2
>>
>>
>>Simple topo is like this:
>>docker_eth0<---+
>>   | veth ip_forward
>>  
>> +host_veth0<->port-eth(ovs-ineternal)
>> host_veth0 and port-eth device stay in physical host.
>>
>>
>>So can we treat skb->hash as a attribute, when send packet to userspace, 
>> encode this attribute;
>>and then do ovs_packet_cmd_execute(), retrieve the same hash value from 
>> userspace?
>>
>>
>>   another important tips:
>>  if we send packets from qemu based tap device, vxlan source port is always 
>> same for the same tcp session;
>>  only when send packets from docker in which packets will do ip_forward, 
>> vxlan source port may different for same tcp session.
>Should be fixed. The patch will be sent.
>>
>>
>>
>>
>>
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] same tcp session encapsulated with different udp src port in kernel mode if packet has do ip_forward

2019-11-04 Thread ychen



we can easily reproduce this phenomenon by using tcp socket stream sending from 
ovs internal port.




At 2019-10-30 19:49:16, "ychen"  wrote:

Hi, 
   when we use docker to establish tcp session, we found that the packet which 
must do upcall to userspace has different encapsulated udp source port 
   with packet that only needs do datapath flow forwarding.


   After some code research and kprobe debug,  we found the following:
   1.  use udp_flow_src_port() to get the port
so when both skb->l4_hash==0 and skb->sw_hash==0, 5 tuple data will be 
used to calculate the skb->hash
2. when first packet of tcp session coming,  packet needs do upcall to 
userspace, and then ovs_packet_cmd_execute() called
new skb is allocated with both l4_hash and sw_hash set to 0
3. when none first packet of tcp sesion coming, function 
ovs_dp_process_packet()->ovs_execute_actions() called,
and this time original skb is reserved. 
when packet has do ip_forward(), kprobe debug prints skb->l4_hash=1, 
sw_hash=0
4. we searched kernel code, and found such code:
 skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
{  if (sk->sk_txhash) {
skb->l4_hash = 1;
skb->hash = sk->sk_txhash;
}
   }
  static inline void sk_set_txhash(struct sock *sk)
  {sk->sk_txhash = net_tx_rndhash();  ==>it is a random value!!}
   5. so let's have a summary:
   when packet is processing only in datapath flow, skb->hash is random 
value for the same tcp session?
   when packet needs processing first to userspace, than kernel space, 
skb->hash is calculated by 5 tuple?

   Our testing enviroment:
   debian 9, kernel 4.9.65
   ovs version: 2.8.2


   Simple topo is like this:
   docker_eth0<---+
  | veth ip_forward
 
+host_veth0<->port-eth(ovs-ineternal)
host_veth0 and port-eth device stay in physical host.


   So can we treat skb->hash as a attribute, when send packet to userspace, 
encode this attribute; 
   and then do ovs_packet_cmd_execute(), retrieve the same hash value from 
userspace?


  another important tips:
 if we send packets from qemu based tap device, vxlan source port is always 
same for the same tcp session;
 only when send packets from docker in which packets will do ip_forward, vxlan 
source port may different for same tcp session.






 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] same tcp session encapsulated with different udp src port in kernel mode if packet has do ip_forward

2019-10-30 Thread ychen
Hi, 
   when we use docker to establish tcp session, we found that the packet which 
must do upcall to userspace has different encapsulated udp source port 
   with packet that only needs do datapath flow forwarding.


   After some code research and kprobe debug,  we found the following:
   1.  use udp_flow_src_port() to get the port
so when both skb->l4_hash==0 and skb->sw_hash==0, 5 tuple data will be 
used to calculate the skb->hash
2. when first packet of tcp session coming,  packet needs do upcall to 
userspace, and then ovs_packet_cmd_execute() called
new skb is allocated with both l4_hash and sw_hash set to 0
3. when none first packet of tcp sesion coming, function 
ovs_dp_process_packet()->ovs_execute_actions() called,
and this time original skb is reserved. 
when packet has do ip_forward(), kprobe debug prints skb->l4_hash=1, 
sw_hash=0
4. we searched kernel code, and found such code:
 skb_set_hash_from_sk(struct sk_buff *skb, struct sock *sk)
{  if (sk->sk_txhash) {
skb->l4_hash = 1;
skb->hash = sk->sk_txhash;
}
   }
  static inline void sk_set_txhash(struct sock *sk)
  {sk->sk_txhash = net_tx_rndhash();  ==>it is a random value!!}
   5. so let's have a summary:
   when packet is processing only in datapath flow, skb->hash is random 
value for the same tcp session?
   when packet needs processing first to userspace, than kernel space, 
skb->hash is calculated by 5 tuple?

   Our testing enviroment:
   debian 9, kernel 4.9.65
   ovs version: 2.8.2


   Simple topo is like this:
   docker_eth0<---+
  | veth ip_forward
 
+host_veth0<->port-eth(ovs-ineternal)
host_veth0 and port-eth device stay in physical host.


   So can we treat skb->hash as a attribute, when send packet to userspace, 
encode this attribute; 
   and then do ovs_packet_cmd_execute(), retrieve the same hash value from 
userspace?


  another important tips:
 if we send packets from qemu based tap device, vxlan source port is always 
same for the same tcp session;
 only when send packets from docker in which packets will do ip_forward, vxlan 
source port may different for same tcp session.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Do not mix recirculation depth into RSS hash itself.

2019-10-30 Thread ychen
Thanks!
I have simply verified in our testing enviroment, and it really worked!








At 2019-10-24 20:32:11, "Ilya Maximets"  wrote:
>Mixing of RSS hash with recirculation depth is useful for flow lookup
>because same packet after recirculation should match with different
>datapath rule.  Setting of the mixed value back to the packet is
>completely unnecessary because recirculation depth is different on
>each recirculation, i.e. we will have different packet hash for
>flow lookup anyway.
>
>This should fix the issue that packets from the same flow could be
>directed to different buckets based on a dp_hash or different ports of
>a balanced bonding in case they were recirculated different number of
>times (e.g. due to conntrack rules).
>With this change, the original RSS hash will remain the same making
>it possible to calculate equal dp_hash values for such packets.
>
>Reported-at: 
>https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/363127.html
>Fixes: 048963aa8507 ("dpif-netdev: Reset RSS hash when recirculating.")
>Signed-off-by: Ilya Maximets 
>---
> lib/dpif-netdev.c | 1 -
> 1 file changed, 1 deletion(-)
>
>diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>index 4546b55e8..c09b8fd95 100644
>--- a/lib/dpif-netdev.c
>+++ b/lib/dpif-netdev.c
>@@ -6288,7 +6288,6 @@ dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
> recirc_depth = *recirc_depth_get_unsafe();
> if (OVS_UNLIKELY(recirc_depth)) {
> hash = hash_finish(hash, recirc_depth);
>-dp_packet_set_rss_hash(packet, hash);
> }
> return hash;
> }
>-- 
>2.17.1
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] group dp_hash method works incorrectly when using snat

2019-09-29 Thread ychen
Hi,
   We found that when the same TCP session  using snat with dp_hash group as 
output actionj, 
   SYN packet and the other packets behaves different, SYN packet outputs to 
one group bucket, and the other packets outputs to another group bucket.


   Here is the ovs flows:
   table=0,in_port=DOWN_PORT,tun_id=vni,ip,actions=ct(nat,zone=ZID,table=1)
   table=1,ip,ct_state=+new,ct(commit,nat,src=SNAT_PUB_IP,zone=ZID,table=2)
   table=1,ip,ct_state=-new,actions=goto_table(table=2)
   table=2,ip,actions=group:1
   
group=1,type=select,selection_method=dp_hash,bucket=actions=output:UP_PORT1,bucket=actions=output:UP_PORT2


  Here is the datapath flow:
  
tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(src=192.168.100.16/255.255.255.240,frag=no),
 packets:5, bytes:455, used:2.978s, flags:FP., 
actions:meter(248),meter(249),ct(zone=1298,nat),recirc(0x176)
flow-dump from pmd on cpu core: 6
tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),ct_state(+new-inv),ct_zone(0x512),recirc_id(0x176),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:0, bytes:0, used:never, 
actions:meter(250),ct(commit,zone=1298,nat(src=172.16.1.152:1024-65535)),recirc(0x177)
tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),ct_state(-new-inv),ct_zone(0x512),recirc_id(0x176),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:00:00:00,dst=00:00:00:00:00:00),eth_type(0x0800),ipv4(ttl=64,frag=no),
 packets:4, bytes:389, used:3.002s, flags:FP., 
actions:set(eth(src=fa:25:fa:c2:52:71,dst=xx:xx:xx:xx:xx:xx)),set(ipv4(ttl=63)),hash(hash_l4(0)),recirc(0x178)
flow-dump from pmd on cpu core: 6
tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0x178),dp_hash(0x8a6c9809/0xf),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:4, bytes:389, used:3.025s, flags:FP., actions:2
tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0x178),dp_hash(0xbab97b2e/0xf),in_port(7),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no),
 packets:0, bytes:0, used:never, actions:3
flow-dump from pmd on cpu core: 6
tunnel(tun_id=0x1435,src=10.185.2.87,dst=10.185.2.93,flags(-df+csum+key)),recirc_id(0x177),in_port(7),packet_type(ns=0,id=0),eth(src=02:00:00:00:00:00,dst=00:00:00:00:00:00),eth_type(0x0800),ipv4(ttl=64,frag=no),
 packets:0, bytes:0, used:never, 
actions:set(eth(src=fa:25:fa:c2:52:71,dst=xx:xx:xx:xx:xx:xx)),set(ipv4(ttl=63)),hash(hash_l4(0)),recirc(0x178)


from the above datapath flow, we can get the conclusion:
 1. the first SYN packet match ct_state=+new, and recirculates 3 times
 2. other packets match ct_state=-new, and recirculates only 2 times
 3. packet's match +new and packets match -new have different dp_hash value, 
hence may output to different port
   (same session TCP packets output to different port may increase the disorder 
risk) 


we researched ovs code, and found the following:
 dpif_netdev_packet_get_rss_hash(struct dp_packet *packet,
const struct miniflow *mf)
{
uint32_t hash, recirc_depth;


if (OVS_LIKELY(dp_packet_rss_valid(packet))) {
hash = dp_packet_get_rss_hash(packet);
} else {
hash = miniflow_hash_5tuple(mf, 0);
dp_packet_set_rss_hash(packet, hash);
}


/* The RSS hash must account for the recirculation depth to avoid
 * collisions in the exact match cache */
recirc_depth = *recirc_depth_get_unsafe();
if (OVS_UNLIKELY(recirc_depth)) {
hash = hash_finish(hash, recirc_depth);=> this code changes the RSS 
hash, and this function is called before EMC lookup
dp_packet_set_rss_hash(packet, hash);
}
return hash;
}


so is there any method to fix this problem? 
we tried change the ovs flow with :
 table=1,ip,ct_state=-new,actions=ct(commit, table=2)
and problem dispeer, but in this time ,packets match ct_state=-new also need 
recirc 3 times which may decrease performance.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] Meter measures incorrect when use multi-pmd in ovs2.10

2019-09-29 Thread ychen
Hi,
   I meet a problem that when send packets using netperf in multi-thread mode.
   The reproducing condition is like this:
   1. ovs 2.10 in dpdk usermode with 2 pmd
   2. Set Meter with rate=100,000pps, burst=20,000packet
   3. when use single-thread mode for netperf, Meter behaves correct, and 
packets larger than 100,pps is dropped
   when use multi-thread mode for netperf, we noticed packets come from 
both 2 pmds, and in this time, Meter measure does not work.


   But Meter behaves correctly in ovs2.8 whether using single pmd or multi-pmd .
   Also, we have merged the patch 42697ca7757b594cc841d944e43ffc17905e3188
  (long_delta_t = now / 1000 - meter->used / 1000)  but problem still exists.




   We researched the meter code, found that meter use time_usec() to compute 
the delta time in ovs2.10.
   but when call the function dp_netdev_run_meter(),  the input parameter 'now' 
variable come from pmd->ctx.now
   and pmd->ctx.now may updated when receive packet in function 
dp_netdev_process_rxq_port()
   so let's suppose the following condition:
   pmd1 receives packet in T1
   pmd2 receives packet in T2 (T2used changed to pmd1->ctx.now
   than handle Meter in pmd2, now = pmd2->ctx.now
   and long_delta_t = now / 1000 - meter->used / 1000 = T2/1000 - T1/1000  
which will be a negative value!!!
   then delta time changed with clause:
  delta_t = (long_delta_t > (long long int)meter->max_delta_t)
? meter->max_delta_t : (uint32_t)long_delta_t;
 in this time, delta_t = (uint32_t)(T2/1000 - T1/1000)  will be overflow???


now, we just fix this problem by modifing the input parameter 'now' with 
time_usec().
we don't know whether this code modify has any side effect(performance issue?)
   
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] vswitchd crashed when revalidate flows in ovs 2.8.2

2019-08-28 Thread ychen



(gdb) p/x seq_mutex
$1 = {
  lock = {
__data = {
  __lock = 0x2, 
  __count = 0x0, 
  __owner = 0x0, > owner is already 0, but still abort
  __nusers = 0x0, 
  __kind = 0x2, 
  __spins = 0x0, 
  __elision = 0x0, 
  __list = {
__prev = 0x0, 
__next = 0x0
  }
}, 
__size = {0x2, 0x0 , 0x2, 0x0 }, 
__align = 0x2
  }, 
  where = 0x7f835b0e5520
}





At 2019-08-26 19:51:20, "ychen"  wrote:

Hi, 
   has any one see the following backtrace?


   Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock 
-vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f82d6ffd700 (LWP 10089))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f835a2b042a in __GI_abort () at abort.c:89
#2  0x7f835a2a7e67 in __assert_fail_base (fmt=, 
assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", 
file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", 
line=line@entry=81, 
function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> 
"__pthread_mutex_lock") at assert.c:92
#3  0x7f835a2a7f12 in __GI___assert_fail 
(assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", 
file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", 
line=line@entry=81, 
function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> 
"__pthread_mutex_lock") at assert.c:101
#4  0x7f835ab30d50 in __GI___pthread_mutex_lock 
(mutex=mutex@entry=0x7f835b3935e0 ) at 
../nptl/pthread_mutex_lock.c:81
#5  0x7f835b064218 in ovs_mutex_lock_at (l_=l_@entry=0x7f835b3935e0 
, where=where@entry=0x7f835b1052cb "lib/seq.c:141")
at lib/ovs-thread.c:76
#6  0x7f835b0841d7 in seq_change (seq=0x55982c7b5630) at lib/seq.c:141
#7  0x7f835b062d06 in ovsrcu_quiesce () at lib/ovs-rcu.c:152
#8  0x7f835b5f7058 in revalidator_sweep__ 
(revalidator=revalidator@entry=0x55982c7bb178, purge=purge@entry=false)
at ofproto/ofproto-dpif-upcall.c:2549
#9  0x7f835b5f9b80 in revalidator_sweep (revalidator=0x55982c7bb178) at 
ofproto/ofproto-dpif-upcall.c:2556
#10 udpif_revalidator (arg=0x55982c7bb178) at ofproto/ofproto-dpif-upcall.c:913
#11 0x7f835b0641d7 in ovsthread_wrapper (aux_=) at 
lib/ovs-thread.c:348
#12 0x7f835ab2e4a4 in start_thread (arg=0x7f82d6ffd700) at 
pthread_create.c:456
#13 0x7f835a364d0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97


we haven't find the rule how to reproduce it, but it seems crashes frequently 
about one time a day
  kernel version: 4.9.0-3-openstack-amd64
  ovs version:2.8.2  








 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] vswitchd crashed when revalidate flows in ovs 2.8.2

2019-08-26 Thread ychen
Hi, 
   has any one see the following backtrace?


   Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock 
-vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f82d6ffd700 (LWP 10089))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f835a2b042a in __GI_abort () at abort.c:89
#2  0x7f835a2a7e67 in __assert_fail_base (fmt=, 
assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", 
file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", 
line=line@entry=81, 
function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> 
"__pthread_mutex_lock") at assert.c:92
#3  0x7f835a2a7f12 in __GI___assert_fail 
(assertion=assertion@entry=0x7f835ab39df2 "mutex->__data.__owner == 0", 
file=file@entry=0x7f835ab39dd5 "../nptl/pthread_mutex_lock.c", 
line=line@entry=81, 
function=function@entry=0x7f835ab39f60 <__PRETTY_FUNCTION__.8475> 
"__pthread_mutex_lock") at assert.c:101
#4  0x7f835ab30d50 in __GI___pthread_mutex_lock 
(mutex=mutex@entry=0x7f835b3935e0 ) at 
../nptl/pthread_mutex_lock.c:81
#5  0x7f835b064218 in ovs_mutex_lock_at (l_=l_@entry=0x7f835b3935e0 
, where=where@entry=0x7f835b1052cb "lib/seq.c:141")
at lib/ovs-thread.c:76
#6  0x7f835b0841d7 in seq_change (seq=0x55982c7b5630) at lib/seq.c:141
#7  0x7f835b062d06 in ovsrcu_quiesce () at lib/ovs-rcu.c:152
#8  0x7f835b5f7058 in revalidator_sweep__ 
(revalidator=revalidator@entry=0x55982c7bb178, purge=purge@entry=false)
at ofproto/ofproto-dpif-upcall.c:2549
#9  0x7f835b5f9b80 in revalidator_sweep (revalidator=0x55982c7bb178) at 
ofproto/ofproto-dpif-upcall.c:2556
#10 udpif_revalidator (arg=0x55982c7bb178) at ofproto/ofproto-dpif-upcall.c:913
#11 0x7f835b0641d7 in ovsthread_wrapper (aux_=) at 
lib/ovs-thread.c:348
#12 0x7f835ab2e4a4 in start_thread (arg=0x7f82d6ffd700) at 
pthread_create.c:456
#13 0x7f835a364d0f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:97


we haven't find the rule how to reproduce it, but it seems crashes frequently 
about one time a day
  kernel version: 4.9.0-3-openstack-amd64
  ovs version:2.8.2  


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] datapath flow will match packet's tll when we use dec_ttl in action

2019-05-29 Thread ychen
hi,
   when I send IP packets with ttl in IP header is random in range(1-255), and 
with all other IP header fields stay not changed
but generated 255 datapath flows each with different ttl value.
of course, i use the action dec_ttl,  here is code:
case OFPACT_DEC_TTL:
wc->masks.nw_ttl = 0xff; 


   my question is: can we optimize the dec_ttl action? only differentiate TTL>1 
and TTL <=1?
   as we all know, when TTL=0, we should send packet to the controller, and let 
it make decision whether we should send icmp error packet out.
   when TTL is larger than 1, I think they are no difference, am i right?




   
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] why the behavior for weigh=0 for group's dp_hash method is different with default selection method?

2019-05-29 Thread ychen
hi,
   I noticed that we can set bucket's weight to 0 when add/mod group.
   1. when we use default select method, and when all the buckets with weight 
larger than 0 change to dead,
  we can still pick the bucket whose weight is 0. here is the code:
 pick_default_select_group()->group_best_live_bucket():
 LIST_FOR_EACH (bucket, list_node, >up.buckets) {
if (bucket_is_alive(ctx, bucket, 0)) {   / so when only bucket 
with weight=0 is active
uint32_t score =
(hash_int(bucket->bucket_id, basis) & 0x) * bucket->weight;
if (score >= best_score) {  and bucket with 
weight=0 does match this clause
best_bucket = bucket;
best_score = score;
}


2. but for dp_hash selection method, we init the bucket when group_construct
and bucket whose weight is 0 will be excluded. Here is the code:
  for (int hash = 0; hash < n_hash; hash++) {
struct webster *winner = [0];
for (i = 1; i < n_buckets; i++) {
if (webster[i].value > winner->value) {    bucket with weight=0 
always be excluded
winner = [i];
}
}


so here is my question: why the behavior is different for dp_hash method and 
default selection method?
 


 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] can not do ecmp with ovs group when send packet out from userspace vxlan port

2018-08-14 Thread ychen
1. environment
 Bridge br-int
fail_mode: secure
Port br-int
Interface br-int
type: internal
   Port "vf-10.180.0.95"
Interface "vf-10.180.0.95"
type: vxlan
options: {csum="true", df_default="false", in_key=flow, 
local_ip="10.180.0.95", out_key=flow, remote_ip=flow}
Port tap111
Interface tap111
type: internal
   Bridge br-phy
fail_mode: secure
Port "dpdk_phy1"
Interface "dpdk_phy1"
type: dpdk
options: {dpdk-devargs=":01:10.0", n_rxq="2"}
Port br-phy
Interface br-phy
type: internal
Port "dpdk_phy0"
Interface "dpdk_phy0"
type: dpdk
options: {dpdk-devargs=":01:10.1", n_rxq="2"}


01:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)
01:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller 
Virtual Function (rev 01)


175: br-phy:  mtu 1500 qdisc pfifo_fast state 
UNKNOWN group default qlen 1000
link/ether fa:86:77:0b:1a:31 brd ff:ff:ff:ff:ff:ff
inet 10.180.0.95/24 scope global br-phy
   valid_lft forever preferred_lft forever
inet6 fe80::f886:77ff:fe0b:1a31/64 scope link
   valid_lft forever preferred_lft forever


bridge br-phy flows:
table=0,priority=150,in_port=LOCAL actions=group:1
table=0,priority=150,in_port="dpdk_phy0" actions=LOCAL
table=0,priority=150,in_port="dpdk_phy1" actions=LOCAL
group_id=1,type=select,bucket=watch_port:"dpdk_phy0",actions=output:"dpdk_phy0",bucket=watch_port:"dpdk_phy1",actions=output:"dpdk_phy1"


bridge br-int flows:
table=0, priority=100,in_port="tap111", 
actions=set_field:10.180.0.81->tun_dst,set_field:0x1435->tun_id,output:"vf-10.180.0.95"


tap111 configurations:
ip netns ns111
ip link set dev tap111 netns ns111
ip netns exec ns111 ip link set dev tap111 up
ip netns exec ns111 ip addr add 192.168.10.5/24 dev tap111
ip netns exec ns111 ip neigh add 192.168.10.6 lladdr 00:00:00:00:11:66 dev 
tap111


send packet from tap111 with ip_dst=192.168.10.6, ip_src=192.168.10.5, udp dst 
port=5000, udp src port= range from 4~65534


2. phenomenon
   we can only watch packet from dpdk_phy0, but not sometimes dpdk_phy0, and 
sometimes dpdk_phy1


3. code trace in ovs
a. we can watch the packet send from dpdk_phy0 with outer 
header(dst=10.180.0.81, src=10.180.0.95, udp src port=range in 32768 to 65535, 
dst=4789), and with inner header(dst=192.168.10.6, src=192.168.10.5,udp dst 
port=5000, udp src port=range from 4~65534)
   b. as we can see, we will use default group select method
 FIRST QUESTION: why we not use udp port for hash calculate? in function 
flow_hash_symmetric_l4(), we can see the following code:
  if (fields.eth_type == htons(ETH_TYPE_IP)) {
fields.ipv4_addr = flow->nw_src ^ flow->nw_dst;
fields.ip_proto = flow->nw_proto;
if (fields.ip_proto == IPPROTO_TCP || fields.ip_proto == IPPROTO_SCTP) {
fields.tp_port = flow->tp_src ^ flow->tp_dst;
}
  }
  c. when send packet out from userspace vxlan port, it will first do group 
select, then build the total tunnel packet and send out
  SECOND QUESTION: how can we use the tunnel src port to do group hash? 
when packet do xlate in function xlate_select_group(), flow->tp_src is always 0
  Thread 1 "ovs-vswitchd" hit Breakpoint 2, xlate_default_select_group 
(ctx=0x7ffc80f91e10, group=0x55f7e8024950)
at ofproto/ofproto-dpif-xlate.c:4135
4135struct flow_wildcards *wc = ctx->wc;
(gdb)  p/x ctx->xin->flow->tp_dst
$6 = 0xb512
(gdb)  p/x ctx->xin->flow->tp_src
$8 = 0x0
(gdb) bt
#0  xlate_default_select_group (ctx=0x7ffc80f91e10, group=0x55f7e8024950) at 
ofproto/ofproto-dpif-xlate.c:4135
#1  0x55f7e7440f6d in xlate_select_group (ctx=0x7ffc80f91e10, 
group=0x55f7e8024950) at ofproto/ofproto-dpif-xlate.c:4260
#2  0x55f7e744100f in xlate_group_action__ (ctx=0x7ffc80f91e10, 
group=0x55f7e8024950) at ofproto/ofproto-dpif-xlate.c:4287
#3  0x55f7e74410df in xlate_group_action (ctx=0x7ffc80f91e10, group_id=1) 
at ofproto/ofproto-dpif-xlate.c:4314
#4  0x55f7e7445405 in do_xlate_actions (ofpacts=0x55f7e7ff3758, 
ofpacts_len=8, ctx=0x7ffc80f91e10) at ofproto/ofproto-dpif-xlate.c:6215
#5  0x55f7e7440117 in xlate_recursively (ctx=0x7ffc80f91e10, 
rule=0x55f7e7ffb1f0, deepens=true) at ofproto/ofproto-dpif-xlate.c:3907
#6  0x55f7e744069d in xlate_table_action (ctx=0x7ffc80f91e10, 
in_port=65534, table_id=0 '\000', may_packet_in=true,
honor_table_miss=true, with_ct_orig=false) at 
ofproto/ofproto-dpif-xlate.c:4033
#7  0x55f7e743f07d in apply_nested_clone_actions (ctx=0x7ffc80f91e10, 
in_dev=0x55f7e8033b60, out_dev=0x55f7e7fff320)
at ofproto/ofproto-dpif-xlate.c:3559
#8  0x55f7e743e266 in validate_and_combine_post_tnl_actions 
(ctx=0x7ffc80f91e10, 

Re: [ovs-dev] [PATCH v2 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-17 Thread ychen
Hi, Jan:
I think the following code should also be modified
 + for (int hash = 0; hash < n_hash; hash++) {
+ double max_val = 0.0;
+ struct webster *winner;
+for (i = 0; i < n_buckets; i++) {
+if (webster[i].value > max_val) {  ===> if 
bucket->weight=0, and there is only one bucket with weight equal to 0, then 
winner will be null
+max_val = webster[i].value;
+winner = [i];
+}

+}


   Test like this command:
   ovs-ofctl add-group br-int -O openflow15 
"group_id=2,type=select,selection_method=dp_hash,bucket=bucket_id=1,weight=0,actions=output:10"
  vswitchd crashed after command put.



At 2018-04-16 22:26:27, "Jan Scheurich"  wrote:
>The current implementation of the "dp_hash" selection method suffers
>from two deficiences: 1. The hash mask and hence the number of dp_hash
>values is just large enough to cover the number of group buckets, but
>does not consider the case that buckets have different weights. 2. The
>xlate-time selection of best bucket from the masked dp_hash value often
>results in bucket load distributions that are quite different from the
>bucket weights because the number of available masked dp_hash values
>is too small (2-6 bits compared to 32 bits of a full hash in the default
>hash selection method).
>
>This commit provides a more accurate implementation of the dp_hash
>select group by applying the well known Webster method for distributing
>a small number of "seats" fairly over the weighted "parties"
>(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method).
>The dp_hash mask is autmatically chosen large enough to provide good
>enough accuracy even with widely differing weights.
>
>This distribution happens at group modification time and the resulting
>table is stored with the group-dpif struct. At xlation time, we use the
>masked dp_hash values as index to look up the assigned bucket.
>
>If the bucket should not be live, we do a circular search over the
>mapping table until we find the first live bucket. As the buckets in
>the table are by construction in pseudo-random order with a frequency
>according to their weight, this method maintains correct distribution
>even if one or more buckets are non-live.
>
>Xlation is further simplified by storing some derived select group state
>at group construction in struct group-dpif in a form better suited for
>xlation purposes.
>
>Adapted the unit test case for dp_hash select group accordingly.
>
>Signed-off-by: Jan Scheurich 
>Signed-off-by: Nitin Katiyar 
>Co-authored-by: Nitin Katiyar 
>---
> include/openvswitch/ofp-group.h |   1 +
> ofproto/ofproto-dpif-xlate.c|  74 +---
> ofproto/ofproto-dpif.c  | 146 
> ofproto/ofproto-dpif.h  |  13 
> tests/ofproto-dpif.at   |  18 +++--
> 5 files changed, 221 insertions(+), 31 deletions(-)
>
>diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
>index 8d893a5..af4033d 100644
>--- a/include/openvswitch/ofp-group.h
>+++ b/include/openvswitch/ofp-group.h
>@@ -47,6 +47,7 @@ struct bucket_counter {
> /* Bucket for use in groups. */
> struct ofputil_bucket {
> struct ovs_list list_node;
>+uint16_t aux;   /* Padding. Also used for temporary data. */
> uint16_t weight;/* Relative weight, for "select" groups. */
> ofp_port_t watch_port;  /* Port whose state affects whether this 
> bucket
>  * is live. Only required for fast failover
>diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>index c8baba1..df245c5 100644
>--- a/ofproto/ofproto-dpif-xlate.c
>+++ b/ofproto/ofproto-dpif-xlate.c
>@@ -4235,35 +4235,55 @@ xlate_hash_fields_select_group(struct xlate_ctx *ctx, 
>struct group_dpif *group,
> }
> }
> 
>+static struct ofputil_bucket *
>+group_dp_hash_best_bucket(struct xlate_ctx *ctx,
>+  const struct group_dpif *group,
>+  uint32_t dp_hash)
>+{
>+struct ofputil_bucket *bucket, *best_bucket = NULL;
>+uint32_t n_hash = group->hash_mask + 1;
>+
>+uint32_t hash = dp_hash &= group->hash_mask;
>+ctx->wc->masks.dp_hash |= group->hash_mask;
>+
>+/* Starting from the original masked dp_hash value iterate over the
>+ * hash mapping table to find the first live bucket. As the buckets
>+ * are quasi-randomly spread over the hash values, this maintains
>+ * a distribution according to bucket weights even when some buckets
>+ * are non-live. */
>+for (int i = 0; i < n_hash; i++) {
>+bucket = group->hash_map[(hash + i) % n_hash];
>+if (bucket_is_alive(ctx, bucket, 0)) {
>+best_bucket = bucket;
>+break;
>+}
>+}
>+
>+return best_bucket;
>+}
>+
> static void
> 

Re: [ovs-dev] [PATCH 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-10 Thread ychen
Hi, Jan:
When I test dp_hash with the new patch, vswitchd was killed by segment 
fault in some conditions.
1. add group with no buckets, then winner will be NULL
2. add buckets with weight with 0, then winner will also be NULL


I did little modify to the patch, will you help to check whether it is correct?


diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 8f6070d..b3a9639 100755
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4773,6 +4773,8 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 webster[i].value = bucket->weight;
 i++;
 }
+//consider bucket weight equal to 0
+if (!min_weight) min_weight = 1;


 uint32_t min_slots = ceil(total_weight / min_weight);
 n_hash = MAX(16, 1L << log_2_ceil(min_slots));
@@ -4794,11 +4796,12 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 for (int hash = 0; hash < n_hash; hash++) {
 VLOG_DBG("Hash value: %d", hash);
 double max_val = 0.0;
-struct webster *winner;
+struct webster *winner = NULL;
 for (i = 0; i < n_buckets; i++) {
 VLOG_DBG("Webster[%d]: divisor=%d value=%.2f",
  i, webster[i].divisor, webster[i].value);
-if (webster[i].value > max_val) {
+// use >= in condition there is only one bucket with weight 0
+if (webster[i].value >= max_val) {
 max_val = webster[i].value;
 winner = [i];
 }
@@ -4827,7 +4830,8 @@ group_set_selection_method(struct group_dpif *group)
 group->selection_method = SEL_METHOD_DEFAULT;
 } else if (!strcmp(selection_method, "dp_hash")) {
 /* Try to use dp_hash if possible at all. */
-if (group_setup_dp_hash_table(group, 64)) {
+uint32_t n_buckets = group->up.n_buckets;
+if (n_buckets && group_setup_dp_hash_table(group, 64)) {
 group->selection_method = SEL_METHOD_DP_HASH;
 group->hash_alg = props->selection_method_param >> 32;
 if (group->hash_alg >= __OVS_HASH_MAX) {




Another question, I found in function xlate_default_select_group and 
xlate_hash_fields_select_group,
when group_best_live_bucket is NULL, it will call ofproto_group_unref,
why dp_hash function no need to call it when there is no best bucket 
found?(exp: group with no buckets)





At 2018-03-21 02:16:17, "Jan Scheurich"  wrote:
>The current implementation of the "dp_hash" selection method suffers
>from two deficiences: 1. The hash mask and hence the number of dp_hash
>values is just large enough to cover the number of group buckets, but
>does not consider the case that buckets have different weights. 2. The
>xlate-time selection of best bucket from the masked dp_hash value often
>results in bucket load distributions that are quite different from the
>bucket weights because the number of available masked dp_hash values
>is too small (2-6 bits compared to 32 bits of a full hash in the default
>hash selection method).
>
>This commit provides a more accurate implementation of the dp_hash
>select group by applying the well known Webster method for distributing
>a small number of "seats" fairly over the weighted "parties"
>(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method).
>The dp_hash mask is autmatically chosen large enough to provide good
>enough accuracy even with widely differing weights.
>
>This distribution happens at group modification time and the resulting
>table is stored with the group-dpif struct. At xlation time, we use the
>masked dp_hash values as index to look up the assigned bucket.
>
>If the bucket should not be live, we do a circular search over the
>mapping table until we find the first live bucket. As the buckets in
>the table are by construction in pseudo-random order with a frequency
>according to their weight, this method maintains correct distribution
>even if one or more buckets are non-live.
>
>Xlation is further simplified by storing some derived select group state
>at group construction in struct group-dpif in a form better suited for
>xlation purposes.
>
>Signed-off-by: Jan Scheurich 
>Signed-off-by: Nitin Katiyar 
>Co-authored-by: Nitin Katiyar 
>Signed-off-by: Jan Scheurich 
>---
> include/openvswitch/ofp-group.h |   1 +
> ofproto/ofproto-dpif-xlate.c|  70 
> ofproto/ofproto-dpif.c  | 142 
> ofproto/ofproto-dpif.h  |  13 
> 4 files changed, 200 insertions(+), 26 deletions(-)
>
>diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
>index 8d893a5..af4033d 100644
>--- a/include/openvswitch/ofp-group.h
>+++ b/include/openvswitch/ofp-group.h
>@@ -47,6 +47,7 @@ struct bucket_counter {
> /* Bucket for use in groups. */
> 

Re: [ovs-dev] can not update userspace vxlan tunnel neigh mac when peer VTEP mac changed

2018-03-27 Thread ychen


HI, Jan,
  Thanks for your reply.
  we have already modify code snooping on the GARP packets, but these 2 problem 
still exists.
   I think the main problem is that GARP packets are not sending from 
interfaces when we changed NIC mac address or IP address(read the linux kernel 
code, there is no such process)
   so we must depend on data packet to trigger the ARP request.
  I know that in linux kernel, when ARP packet is triggered, data packets will 
be cached in a specified time, so the first data packet can still be send out 
when ARP reply is received.


  for the second problem, can we update tunnel neigh cache when we receive data 
packet from remote VTEP? since we  can fetch tun_src and outer mac sa from the 
data packet.







At 2018-03-28 04:41:12, "Jan Scheurich" <jan.scheur...@ericsson.com> wrote:
>Hi Ychen,
>
>Funny! Again we are already working on a solution for problem 1. 
>
>In our scenario the situation arises with a tunnel next hop being a VRRP 
>switch pair. The switch sends periodic gratuitous ARPs (GARPs) to announce the 
>VRRP IP but OVS native tunneling doesn't snoop on GARPs, only on ARP 
>replies. The host IP stack, on the other hand, accepts these GARPs and stops 
>sending refresh ARP requests itself. Hence nothing for OVS to snoop upon.
>
>The solution is to make OVS snoop on GARP requests also.
> 
>It is quite possible that this will also fix your problem 2. If you also have 
>a VRRP tunnel next hop which just moves its VRRP IP address but not the MAC 
>address,  should send a GARP with the new IP/MAC mapping when it moves the IP 
>address, which would now update OVS' tunnel neighbor cache.
>
>@Mano: Can you submit the GARP patch in the near future?
>
>BR, Jan
>
>> -Original Message-
>> From: ovs-dev-boun...@openvswitch.org 
>> [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of ychen
>> Sent: Tuesday, 27 March, 2018 14:44
>> To: d...@openvswitch.org
>> Subject: [ovs-dev] can not update userspace vxlan tunnel neigh mac when peer 
>> VTEP mac changed
>> 
>> Hi,
>>I found that sometime userspace vxlan can not work happily.
>>1.  first data packet loss
>> when tunnel neigh cache is empty, then the first data packet 
>> triggered  sending ARP packet to peer VTEP, and the data packet
>> dropped,
>> tunnel neigh cache added this entry when receive ARP reply packet.
>> 
>> err = tnl_neigh_lookup(out_dev->xbridge->name, _ip6, );
>>if (err) {
>> xlate_report(ctx, OFT_DETAIL,
>>  "neighbor cache miss for %s on bridge %s, "
>>  "sending %s request",
>>  buf_dip6, out_dev->xbridge->name, d_ip ? "ARP" : "ND");
>> if (d_ip) {
>> tnl_send_arp_request(ctx, out_dev, smac, s_ip, d_ip);
>> } else {
>> tnl_send_nd_request(ctx, out_dev, smac, _ip6, _ip6);
>> }
>> return err;
>> }
>> 
>> 
>> 2. connection lost when peer VTEP mac changed
>> when VTEP mac is already in tunnel neigh cache,   exp:
>> 10.182.6.81   fa:eb:26:c3:16:a5   br-phy
>> 
>> so when data packet come in,  it will use this mac for encaping outer 
>> VXLAN header.
>> but VTEP 10.182.6.81  mac changed from  fa:eb:26:c3:16:a5 to  
>> 24:eb:26:c3:16:a5 because of NIC changed.
>> 
>> data packet continue sending with the old mac  fa:eb:26:c3:16:a5, but 
>> the peer VTEP will not accept these packets because of mac
>> not match.
>> the wrong tunnel neigh entry aging until the data packet stop sending.
>> 
>> 
>>if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)) {
>> tnl_neigh_snoop(flow, wc, ctx->xbridge->name);
>> }
>> 
>> 
>> 3. is there anybody has working for these problems?
>> 
>> 
>> 
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] can not update userspace vxlan tunnel neigh mac when peer VTEP mac changed

2018-03-27 Thread ychen
Hi, 
   I found that sometime userspace vxlan can not work happily.
   1.  first data packet loss
when tunnel neigh cache is empty, then the first data packet triggered  
sending ARP packet to peer VTEP, and the data packet dropped,
tunnel neigh cache added this entry when receive ARP reply packet.
   
err = tnl_neigh_lookup(out_dev->xbridge->name, _ip6, );
   if (err) {
xlate_report(ctx, OFT_DETAIL,
 "neighbor cache miss for %s on bridge %s, "
 "sending %s request",
 buf_dip6, out_dev->xbridge->name, d_ip ? "ARP" : "ND");
if (d_ip) {
tnl_send_arp_request(ctx, out_dev, smac, s_ip, d_ip);
} else {
tnl_send_nd_request(ctx, out_dev, smac, _ip6, _ip6);
}
return err;   
}


2. connection lost when peer VTEP mac changed
when VTEP mac is already in tunnel neigh cache,   exp: 
10.182.6.81   fa:eb:26:c3:16:a5   br-phy

so when data packet come in,  it will use this mac for encaping outer VXLAN 
header.
but VTEP 10.182.6.81  mac changed from  fa:eb:26:c3:16:a5 to  
24:eb:26:c3:16:a5 because of NIC changed.

data packet continue sending with the old mac  fa:eb:26:c3:16:a5, but the 
peer VTEP will not accept these packets because of mac not match.
the wrong tunnel neigh entry aging until the data packet stop sending.


   if (ovs_native_tunneling_is_on(ctx->xbridge->ofproto)) {
tnl_neigh_snoop(flow, wc, ctx->xbridge->name);
}


3. is there anybody has working for these problems?



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] can not well distributed when use dp_hash for ovs group

2018-03-20 Thread ychen
hi, 
  I tested dp_hash for ovs group, and found that dp_hash can not well 
distributed, some buckets even can not be selected.
  In my testing environment, I have 11 buckets:
group_id=131841,type=select,selection_method=dp_hash,
bucket=bucket_id:51162,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.29:80))),
bucket=bucket_id:42099,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.25:80))),
bucket=bucket_id:53526,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.27:80))),
bucket=bucket_id:12221,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.40:80))),
bucket=bucket_id:2787,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.26:80))),
bucket=bucket_id:18951,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.24:80))),
bucket=bucket_id:32559,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.62:80))),
bucket=bucket_id:35550,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.43:80))),
bucket=bucket_id:9026,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.57:80))),
bucket=bucket_id:26811,weight:100,actions=ct(commit,table=70,zone=2,exec(move:NXM_OF_ETH_SRC[]->NXM_NX_CT_LABEL[0..47],nat(dst=10.204.8.34:80)))


 But about 3~5 buckets can not be selected always.


  In the function xlate_dp_hash_select_group(), I found the code:
  uint32_t mask = (1 << log_2_ceil(n_buckets)) - 1;
  uint32_t basis = 0xc2b73583 * (ctx->xin->flow.dp_hash & mask);
uint32_t score =  (hash_int(bucket->bucket_id, basis) & 0x) * 
bucket->weight;


 for the above formula, if n_buckets is 11, then there are totally 16 
probabilities for basis.
so how can we make sure the best score can well distributed in the 11 buckets?
 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] is there any performance consideration for max emc cache numbers and megaflow cache numbers?

2018-01-05 Thread ychen
Hi:
in ovs code,
MAX_FLOWS = 65536  // for megaflow
#define EM_FLOW_HASH_SHIFT 13
#define EM_FLOW_HASH_ENTRIES (1u << EM_FLOW_HASH_SHIFT)   // for emc cache


so why choose 65536 and 8192? is there any performance consideration? can I 
just larger these numbers to make packet only lookup emc cache and megaflow 
cache?


another question:
is there any document/data for packet thuoughput in netdev dpdk mode with only 
emc cache/megaflow cache or with only userspace flow lookup?


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] which fields should be masked or unmasked while using megaflow match?

2017-12-27 Thread ychen
HI, is there any policy about which fields should be wildcard when using 
megaflow match?
exp 1:
 table=0, priorIty=0,actions=NORMAL
 then the datapath flow is like that:
 
recirc_id(0),in_port(3),eth(src=b6:49:dd:5d:3a:a6,dst=2e:b5:7b:d6:52:c2),eth_type(0x0806),
 packets:0, bytes:0, used:never, actions:2
 
recirc_id(0),in_port(2),eth(src=2e:b5:7b:d6:52:c2,dst=b6:49:dd:5d:3a:a6),eth_type(0x0800),ipv4(frag=no),
 packets:12, bytes:1176, used:0.825s, actions:3


exp 2:
table=0,in_port=1,actions=2
table=0,in_port=2,actions=1
then the datapath flow is like that:
recirc_id(0),in_port(2),eth_type(0x0800),ipv4(frag=no), packets:26, bytes:2548, 
used:0.441s, actions:3
recirc_id(0),in_port(3),eth_type(0x0800),ipv4(frag=no), packets:26, bytes:2548, 
used:0.441s, actions:2


my question is why ETH_SRC, ETH_DST is needed when using normal action?


exp 3:
table=0,in_port=1,nw_src=1.1.1.0/24, actions=2
table=0,in_port=2,nw_src=1.1.1.0/24, actions=1
then the datapath flow is like that:
recirc_id(0),in_port(3),eth_type(0x0800),ipv4(src=1.1.1.0/255.255.255.0,frag=no),
 packets:1863, bytes:182574, used:0.552s, actions:2
recirc_id(0),in_port(2),eth_type(0x0800),ipv4(src=1.1.1.0/255.255.255.0,frag=no),
 packets:1863, bytes:182574, used:0.552s, actions:3


exp 4:
table=0,in_port=1,nw_src=1.1.1.0/24, actions=mod_nw_src:1.1.1.3, output:2
table=0,in_port=2,actions=1


then the datapath flow is like that:
recirc_id(0),in_port(3),eth_type(0x0800),ipv4(src=1.1.1.2,frag=no), packets:37, 
bytes:3626, used:0.332s, actions:set(ipv4(src=1.1.1.3)),2
recirc_id(0),in_port(2),eth_type(0x0800),ipv4(frag=no), packets:37, bytes:3626, 
used:0.332s, actions:3


my question is why NW_SRC=1.1.1.2 should be all masked with 0xff, why not 
0xff00 like the rule we created?


in one word, is there any rules to set flow mask when using megaflow match? 
which fields should be wildcard? why? 
we can extract all fields from packets, and we can find the rule match the 
packet, but why the datapath flow match fields is not the same as userspace 
rule?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] is there any document about how to build debian package with dpdk?

2017-09-21 Thread ychen
I have read this document, but following this guide, I can not build package 
for openvswitch-switch-dpdk.
I want to build package with our own libdpdk, and is there any guides?








At 2017-09-21 16:25:58, "Bodireddy, Bhanuprakash" 
 wrote:
>>we modified little code for dpdk, so we must rebuild ovs debian package with
>>dpdk by ourself.
>>so is there any guide about how to build openvswith-dpdk package?
>
>There is a guide on this here 
>http://docs.openvswitch.org/en/latest/intro/install/debian/
>
>- Bhanuprakash.
>
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] is there any document about how to build debian package with dpdk?

2017-09-21 Thread ychen
we modified little code for dpdk, so we must rebuild ovs debian package with 
dpdk by ourself.
so is there any guide about how to build openvswith-dpdk package?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] does ovs bfd support flow based tunnel?

2017-09-18 Thread ychen
for flow-based tunnel:
ovs-vsctl add-port br-int vxlan1 -- set interface vxlan1  type=vxlan 
options:remote_ip=flow options:key=flow options:local_ip=10.10.0.1
ovs-vsctl set interface vxlan1 bfd:enable=true


when I enable bfd in such a vxlan interface , I can not capture any bfd packets 
in the physical port.(which used by vxlan interface)






At 2017-09-14 23:38:19, "Miguel Angel Ajo Pelayo" <majop...@redhat.com> wrote:

What do you mean by flow-based tunnel?


We're using it internally to provide HA connectivity to Gateway_Chassis on OVN, 
and it's working as a charm to monitor tunnel endpoints on OVS bridges.


https://github.com/openvswitch/ovs/blob/master/ovn/controller/bfd.c



On Tue, Sep 12, 2017 at 9:19 PM, ychen <ychen103...@163.com> wrote:
can I enable bfd on flow based tunnel? does it work?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] why the max action length is 32K in kernel?

2017-09-12 Thread ychen
in function nla_alloc_flow_actions(), there is a check if action length is 
greater than MAX_ACTIONS_BUFSIZE(32k), then kernel datapath flow will not be 
installed, and packets will droppped.
but in function xlate_actions(), there is such clause:
if (nl_attr_oversized(ctx.odp_actions->size)) {
/* These datapath actions are too big for a Netlink attribute, so we
 * can't hand them to the kernel directly.  dpif_execute() can execute
 * them one by one with help, so just mark the result as SLOW_ACTION to
 * prevent the flow from being installed. */
COVERAGE_INC(xlate_actions_oversize);
ctx.xout->slow |= SLOW_ACTION;
}
and in function nl_attr_oversized(), the clause is like this:
return payload_size > UINT16_MAX - NLA_HDRLEN;


so we can see that in user space, max action length is almost 64K, but in 
kernel space, max action length is only 32K. 
my question is: why the max action length is different? packet will drop when 
its action length exceeds 32K, but packet can excute in slow path when its 
action length exceeds 64K?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] does ovs bfd support flow based tunnel?

2017-09-12 Thread ychen
can I enable bfd on flow based tunnel? does it work?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] ifup locked when start ovs in debian9 with systemd

2017-06-20 Thread ychen
thanks Shetty
I tried the patch provided by you, it does fix the problem when ovs started.

At 2017-06-19 23:00:18, "Guru Shetty" <g...@ovn.org> wrote:

What OVS version is this? What is the platform version? i.e Debian/Ubuntu etc. 
Does your OVS have the following fix?
https://github.com/openvswitch/ovs/commit/15af3d44c65eb3cd724378ce1b30c51aa87f4f69



On 19 June 2017 at 07:17, ychen <ychen103...@163.com> wrote:
1. phenomenon
   ifup: waiting for lock on /run/network/ifstate.br-int
2. configurations
   /etc/network/interfaces
   allow-ovs br-int
iface br-int inet manual
  ovs_type OVSBridge

  ovs_ports tap111


allow-br-int tap111
iface ngwintp inet manual
  ovs_bridge br-int
  ovs_type OVSIntPort


3. start ovs
systemctl start openvswitch-switch
now we can see 2 ifup processes, and when use cmd " systemctl status 
openvswitch-switch",
we can see error " ifup: waiting for lock on /run/network/ifstate.br-int"


4. I found that in ovs ifupdown.sh scripts, there is such bash cmd:
if /etc/init.d/openvswitch-switch status > /dev/null 2>&1; then :; else
/etc/init.d/openvswitch-switch start
  fi
  is it means if openvswitch is not running, then start it?
  but when use systemd, "/etc/init.d/openvswitch-switch status" this command 
always returns value not equal to 0,
  hence openvswitch restarted, then ifup will be boot again , that's the reason 
caused the LOCK
5. when use the following bash cmd:
  if ovs_ctl status > /dev/null 2>&1; then :; else
/etc/init.d/openvswitch-switch start
fi
  then we can start and stop openvswitch smoothly
  but when we use "ifup --allow=ovs br-int", ifup LOCKED  again


6. my question is:
why we need to brought up openvswitch process in ifupdown.sh?
is there a simple way to fix this problem?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] ifup locked when start ovs in debian9 with systemd

2017-06-19 Thread ychen
1. phenomenon
   ifup: waiting for lock on /run/network/ifstate.br-int
2. configurations
   /etc/network/interfaces
   allow-ovs br-int
iface br-int inet manual
  ovs_type OVSBridge

  ovs_ports tap111


allow-br-int tap111
iface ngwintp inet manual
  ovs_bridge br-int
  ovs_type OVSIntPort


3. start ovs
systemctl start openvswitch-switch
now we can see 2 ifup processes, and when use cmd " systemctl status 
openvswitch-switch", 
we can see error " ifup: waiting for lock on /run/network/ifstate.br-int"


4. I found that in ovs ifupdown.sh scripts, there is such bash cmd:
if /etc/init.d/openvswitch-switch status > /dev/null 2>&1; then :; else
/etc/init.d/openvswitch-switch start
  fi
  is it means if openvswitch is not running, then start it?
  but when use systemd, "/etc/init.d/openvswitch-switch status" this command 
always returns value not equal to 0, 
  hence openvswitch restarted, then ifup will be boot again , that's the reason 
caused the LOCK
5. when use the following bash cmd:
  if ovs_ctl status > /dev/null 2>&1; then :; else
/etc/init.d/openvswitch-switch start
fi
  then we can start and stop openvswitch smoothly
  but when we use "ifup --allow=ovs br-int", ifup LOCKED  again


6. my question is:
why we need to brought up openvswitch process in ifupdown.sh?
is there a simple way to fix this problem?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-05-15 Thread ychen
I can reproduce this problem with the script supported by vguntaka in both ovs 
version 2.5 and ovs version 2.6.

1.   Add bridge

ovs-vsctl add-br br0

 

2.   Add vm port

ovs-vsctl add-port br0 tap0 – set interface tap0 type=internal

 

ip netns add ns0

ip link set dev tap0 netns ns0

ip netns exec ns0 ip link set dev tap0 up

ip netns exec ns0 ip addr add dev tap0 1.1.1.2/24

ip netns exec ns0 ip route add default via 1.1.1.1

ip netns exec ns0 ip neigh add 1.1.1.1 lladdr 00:00:00:00:11:11 dev tap0

 

3.   Send packet

Ip netns exec ns0 ping 2.2.2.2

 

4.   Add flows (make sure packet is always sending)

while true

do

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=1,actions=controller"

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=2,actions=controller"

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=3,actions=controller"

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=4,actions=controller"

ovs-ofctl del-flows br0

done

 

waiting about 1 minute or 2 minute, there is error “OFPT_ERROR (xid=0x4): 
OFPFMFC_BAD_COMMAND” printed in console.

 

Also, I noticed that when using Openflow13, the error disappeared, like this:

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" –O 
openflow13
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev