Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-06-01 Thread Ben Pfaff
On Tue, May 16, 2017 at 12:07:38PM +0800, ychen wrote:
> I can reproduce this problem with the script supported by vguntaka in both 
> ovs version 2.5 and ovs version 2.6.
> 
> 1.   Add bridge
> 
> ovs-vsctl add-br br0
> 
>  
> 
> 2.   Add vm port
> 
> ovs-vsctl add-port br0 tap0 – set interface tap0 type=internal
> 
>  
> 
> ip netns add ns0
> 
> ip link set dev tap0 netns ns0
> 
> ip netns exec ns0 ip link set dev tap0 up
> 
> ip netns exec ns0 ip addr add dev tap0 1.1.1.2/24
> 
> ip netns exec ns0 ip route add default via 1.1.1.1
> 
> ip netns exec ns0 ip neigh add 1.1.1.1 lladdr 00:00:00:00:11:11 dev tap0
> 
>  
> 
> 3.   Send packet
> 
> Ip netns exec ns0 ping 2.2.2.2
> 
>  
> 
> 4.   Add flows (make sure packet is always sending)
> 
> while true
> 
> do
> 
> ovs-ofctl add-flow br0 
> "priority=200,table=123,idle_timeout=1,in_port=1,actions=controller"
> 
> ovs-ofctl add-flow br0 
> "priority=200,table=123,idle_timeout=1,in_port=2,actions=controller"
> 
> ovs-ofctl add-flow br0 
> "priority=200,table=123,idle_timeout=1,in_port=3,actions=controller"
> 
> ovs-ofctl add-flow br0 
> "priority=200,table=123,idle_timeout=1,in_port=4,actions=controller"
> 
> ovs-ofctl del-flows br0
> 
> done
> 
>  
> 
> waiting about 1 minute or 2 minute, there is error “OFPT_ERROR (xid=0x4): 
> OFPFMFC_BAD_COMMAND” printed in console.

Thanks for working on debugging this.  Can you get a backtrace from GDB,
or an error report from valgrind?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-05-15 Thread ychen
I can reproduce this problem with the script supported by vguntaka in both ovs 
version 2.5 and ovs version 2.6.

1.   Add bridge

ovs-vsctl add-br br0

 

2.   Add vm port

ovs-vsctl add-port br0 tap0 – set interface tap0 type=internal

 

ip netns add ns0

ip link set dev tap0 netns ns0

ip netns exec ns0 ip link set dev tap0 up

ip netns exec ns0 ip addr add dev tap0 1.1.1.2/24

ip netns exec ns0 ip route add default via 1.1.1.1

ip netns exec ns0 ip neigh add 1.1.1.1 lladdr 00:00:00:00:11:11 dev tap0

 

3.   Send packet

Ip netns exec ns0 ping 2.2.2.2

 

4.   Add flows (make sure packet is always sending)

while true

do

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=1,actions=controller"

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=2,actions=controller"

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=3,actions=controller"

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=4,actions=controller"

ovs-ofctl del-flows br0

done

 

waiting about 1 minute or 2 minute, there is error “OFPT_ERROR (xid=0x4): 
OFPFMFC_BAD_COMMAND” printed in console.

 

Also, I noticed that when using Openflow13, the error disappeared, like this:

ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=1,actions=controller" –O 
openflow13
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-05-10 Thread Zang MingJie

Confirmed, can be easily reproduced using described method.

Using ovs 2.6.2

On 01/23/2017 11:58 PM, Vidyasagara Guntaka via dev wrote:

Hi Ben,

We could reproduce this with the latest version 2.6.1.  When we compiled the 
code, we removed -O2 from CFLAGS.  This seems to make it happen more 
frequently.  With the following script running, the error starts happening 
within a few seconds and then continues to happen after every few seconds.  In 
summary, our suspicion is that having no controller set and having no NORMAL 
processing flow seems to trigger the stack that was pointed out in the gdb 
session more often and hence we hit this race condition very easily using the 
following script.  (Even if there is a default NORMAL processing flow entry, 
after the first iteration of the script below, that will be deleted).

Also, a few things about the setup - just in case:
  * enp5s0 belongs to the physical interface on this hypervisor.
  * vport0 and vport1 belong to tap interfaces corresponding to two VMs running 
on this hypervisor.
  * When the script was running, we had pings issued from the VMs so that 
packets make it to the bridge br0.

Here is a small script that makes it happen on multiple of our hypervisors:

while true
do
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=1,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=2,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=3,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=4,actions=controller"
ovs-ofctl del-flows br0
done

Here is our bridge br0 setup:

[root@deepspace ~]# ifconfig br0
br0: flags=4163  mtu 1500
inet 192.168.2.142  netmask 255.255.255.0  broadcast 192.168.2.255
inet6 fe80::213:3bff:fe0f:1301  prefixlen 64  scopeid 0x20
ether 00:13:3b:0f:13:01  txqueuelen 1000  (Ethernet)
RX packets 89417814  bytes 12088012200 (11.2 GiB)
RX errors 0  dropped 82  overruns 0  frame 0
TX packets 32330647  bytes 3168352394 (2.9 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@deepspace ~]# ovs-vsctl show
54f89e00-edd2-486e-9626-6d11c7d8b0b6
Bridge "br0"
Port "vport1"
Interface "vport1"
Port "br0"
Interface "br0"
type: internal
Port vtep
Interface vtep
type: vxlan
options: {key=flow, remote_ip="192.168.1.141"}
Port "vport0"
Interface "vport0"
Port "enp5s0"
Interface "enp5s0"
ovs_version: “2.6.1"

[root@deepspace ~]# ovs-ofctl show br0
OFPT_FEATURES_REPLY (xid=0x2): dpid:00133b0f1301
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src 
mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(enp5s0): addr:00:13:3b:0f:13:01
 config: 0
 state:  0
 current:1GB-FD AUTO_NEG
 advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER 
AUTO_NEG AUTO_PAUSE AUTO_PAUSE_ASYM
 supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG
 speed: 1000 Mbps now, 1000 Mbps max
 2(vport0): addr:fe:00:00:00:00:03
 config: 0
 state:  0
 current:10MB-FD COPPER
 speed: 10 Mbps now, 0 Mbps max
 3(vport1): addr:fe:00:00:00:00:04
 config: 0
 state:  0
 current:10MB-FD COPPER
 speed: 10 Mbps now, 0 Mbps max
 4(vtep): addr:aa:97:d2:a9:19:ed
 config: 0
 state:  0
 speed: 0 Mbps now, 0 Mbps max
 LOCAL(br0): addr:00:13:3b:0f:13:01
 config: 0
 state:  0
 speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

[root@deepspace ~]# ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.6.1
OpenFlow versions 0x1:0x4

Please let us know if you need anything else to reproduce this.

Thanks,
Sagar.


On Jan 18, 2017, at 1:19 PM, Ben Pfaff  wrote:

If you can come up with simple reproduction instructions that work for
me, I'm happy to track this down.  It's probably something very simple.

On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote:

This issue happened on our in-use systems and we were trying to find a way
to move forward avoiding this issue so that we do not have to upgrade OVS
on thousands of our hypervisors causing down time. Our debugging did help
us avoid the issue for now by installing an explicit rule to to drop
packets when there is no match and this issue is not seen over many hours
of test runs.

We will definitely run this test with latest version.  But, will need more
time since we are busy with our release related activities.

Regards,
Sagar.

On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff 

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-23 Thread Vidyasagara Guntaka via dev
Hi Ben,

We could reproduce this with the latest version 2.6.1.  When we compiled the 
code, we removed -O2 from CFLAGS.  This seems to make it happen more 
frequently.  With the following script running, the error starts happening 
within a few seconds and then continues to happen after every few seconds.  In 
summary, our suspicion is that having no controller set and having no NORMAL 
processing flow seems to trigger the stack that was pointed out in the gdb 
session more often and hence we hit this race condition very easily using the 
following script.  (Even if there is a default NORMAL processing flow entry, 
after the first iteration of the script below, that will be deleted).

Also, a few things about the setup - just in case:
  * enp5s0 belongs to the physical interface on this hypervisor.
  * vport0 and vport1 belong to tap interfaces corresponding to two VMs running 
on this hypervisor.
  * When the script was running, we had pings issued from the VMs so that 
packets make it to the bridge br0.

Here is a small script that makes it happen on multiple of our hypervisors:

while true
do
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=1,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=2,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=3,actions=controller"
ovs-ofctl add-flow br0 
"priority=200,table=123,idle_timeout=1,in_port=4,actions=controller"
ovs-ofctl del-flows br0 
done

Here is our bridge br0 setup:

[root@deepspace ~]# ifconfig br0
br0: flags=4163  mtu 1500
inet 192.168.2.142  netmask 255.255.255.0  broadcast 192.168.2.255
inet6 fe80::213:3bff:fe0f:1301  prefixlen 64  scopeid 0x20
ether 00:13:3b:0f:13:01  txqueuelen 1000  (Ethernet)
RX packets 89417814  bytes 12088012200 (11.2 GiB)
RX errors 0  dropped 82  overruns 0  frame 0
TX packets 32330647  bytes 3168352394 (2.9 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@deepspace ~]# ovs-vsctl show
54f89e00-edd2-486e-9626-6d11c7d8b0b6
Bridge "br0"
Port "vport1"
Interface "vport1"
Port "br0"
Interface "br0"
type: internal
Port vtep
Interface vtep
type: vxlan
options: {key=flow, remote_ip="192.168.1.141"}
Port "vport0"
Interface "vport0"
Port "enp5s0"
Interface "enp5s0"
ovs_version: “2.6.1"

[root@deepspace ~]# ovs-ofctl show br0
OFPT_FEATURES_REPLY (xid=0x2): dpid:00133b0f1301
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src 
mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
 1(enp5s0): addr:00:13:3b:0f:13:01
 config: 0
 state:  0
 current:1GB-FD AUTO_NEG
 advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER 
AUTO_NEG AUTO_PAUSE AUTO_PAUSE_ASYM
 supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-HD 1GB-FD COPPER AUTO_NEG
 speed: 1000 Mbps now, 1000 Mbps max
 2(vport0): addr:fe:00:00:00:00:03
 config: 0
 state:  0
 current:10MB-FD COPPER
 speed: 10 Mbps now, 0 Mbps max
 3(vport1): addr:fe:00:00:00:00:04
 config: 0
 state:  0
 current:10MB-FD COPPER
 speed: 10 Mbps now, 0 Mbps max
 4(vtep): addr:aa:97:d2:a9:19:ed
 config: 0
 state:  0
 speed: 0 Mbps now, 0 Mbps max
 LOCAL(br0): addr:00:13:3b:0f:13:01
 config: 0
 state:  0
 speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

[root@deepspace ~]# ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.6.1
OpenFlow versions 0x1:0x4

Please let us know if you need anything else to reproduce this.

Thanks,
Sagar.

> On Jan 18, 2017, at 1:19 PM, Ben Pfaff  wrote:
> 
> If you can come up with simple reproduction instructions that work for
> me, I'm happy to track this down.  It's probably something very simple.
> 
> On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote:
>> This issue happened on our in-use systems and we were trying to find a way
>> to move forward avoiding this issue so that we do not have to upgrade OVS
>> on thousands of our hypervisors causing down time. Our debugging did help
>> us avoid the issue for now by installing an explicit rule to to drop
>> packets when there is no match and this issue is not seen over many hours
>> of test runs.
>> 
>> We will definitely run this test with latest version.  But, will need more
>> time since we are busy with our release related activities.
>> 
>> Regards,
>> Sagar.
>> 
>> On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff  wrote:
>> 
>>> It would be more helpful to have a simple reproduction case.
>>> 
>>> 

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-18 Thread Ben Pfaff
If you can come up with simple reproduction instructions that work for
me, I'm happy to track this down.  It's probably something very simple.

On Tue, Jan 17, 2017 at 08:50:20AM -0800, Vidyasagara Guntaka wrote:
> This issue happened on our in-use systems and we were trying to find a way
> to move forward avoiding this issue so that we do not have to upgrade OVS
> on thousands of our hypervisors causing down time. Our debugging did help
> us avoid the issue for now by installing an explicit rule to to drop
> packets when there is no match and this issue is not seen over many hours
> of test runs.
> 
> We will definitely run this test with latest version.  But, will need more
> time since we are busy with our release related activities.
> 
> Regards,
> Sagar.
> 
> On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff  wrote:
> 
> > It would be more helpful to have a simple reproduction case.
> >
> > Why haven't you tried a newer version from branch-2.5?
> >
> > On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote:
> > > Hi Ben,
> > >
> > > Here i is more debug information related to this incident (still using
> > version 2.5.0):
> > >
> > > Summary :
> > >
> > > We think that there is some race condition involved in processing OF
> > Controller connections and Packet miss processing in ovs-vswitchd.
> > >
> > > Reasoning :
> > >
> > > Please consider the following GDB Debug Session:
> > >
> > > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810,
> > protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999
> > > (gdb) f 2
> > > #2  0x0045f586 in connmgr_wants_packet_in_on_miss
> > (mgr=0x16a6de0) at ofproto/connmgr.c:1613
> > > 1613  enum ofputil_protocol protocol =
> > ofconn_get_protocol(ofconn);
> > > (gdb) p *ofconn
> > > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash =
> > 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type =
> > OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true,
> > >   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID,
> > packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170,
> > schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0,
> > >   controller_id = 0, reply_counter = 0x1673190, master_async_config =
> > {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0,
> > n_delete = 0, n_modify = 0,
> > >   first_op = -9223372036854775808, last_op = -9223372036854775808,
> > next_op_report = 9223372036854775807, op_backoff = -9223372036854775808,
> > monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0,
> > > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates =
> > {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles =
> > {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}}
> > > (gdb) bt
> > > #0  ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD)
> > at ofproto/connmgr.c:999
> > > #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at
> > ofproto/connmgr.c:982
> > > #2  0x0045f586 in connmgr_wants_packet_in_on_miss
> > (mgr=0x16a6de0) at ofproto/connmgr.c:1613
> > > #3  0x00435261 in rule_dpif_lookup_from_table
> > (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0,
> > stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true,
> > > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973
> > > #4  0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0,
> > xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188
> > > #5  0x004481b1 in revalidate_ukey (udpif=0x16a7300,
> > ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40,
> > reval_seq=585728, recircs=0x7f2ace7f8a30)
> > > at ofproto/ofproto-dpif-upcall.c:1866
> > > #6  0x00448fb2 in revalidate (revalidator=0x1691990) at
> > ofproto/ofproto-dpif-upcall.c:2186
> > > #7  0x0044593e in udpif_revalidator (arg=0x1691990) at
> > ofproto/ofproto-dpif-upcall.c:862
> > > #8  0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at
> > lib/ovs-thread.c:340
> > > #9  0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/
> > libpthread.so.0
> > > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > > (gdb) f 1
> > > #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at
> > ofproto/connmgr.c:982
> > > 982   ofconn_set_protocol(CONST_CAST(struct ofconn *,
> > ofconn),
> > > (gdb) l
> > > 977   {
> > > 978   if (ofconn->protocol == OFPUTIL_P_NONE &&
> > > 979   rconn_is_connected(ofconn->rconn)) {
> > > 980   int version = rconn_get_version(ofconn->rconn);
> > > 981   if (version > 0) {
> > > 982   ofconn_set_protocol(CONST_CAST(struct ofconn *,
> > ofconn),
> > > 983   ofputil_protocol_from_ofp_
> > version(version));
> > > 984   }
> > > 985   }
> > > 986
> > > (gdb) p *ofconn
> > > $3 = {node = {prev = 

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-17 Thread Vidyasagara Guntaka via dev
This issue happened on our in-use systems and we were trying to find a way
to move forward avoiding this issue so that we do not have to upgrade OVS
on thousands of our hypervisors causing down time. Our debugging did help
us avoid the issue for now by installing an explicit rule to to drop
packets when there is no match and this issue is not seen over many hours
of test runs.

We will definitely run this test with latest version.  But, will need more
time since we are busy with our release related activities.

Regards,
Sagar.

On Tue, Jan 17, 2017 at 8:42 AM, Ben Pfaff  wrote:

> It would be more helpful to have a simple reproduction case.
>
> Why haven't you tried a newer version from branch-2.5?
>
> On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote:
> > Hi Ben,
> >
> > Here i is more debug information related to this incident (still using
> version 2.5.0):
> >
> > Summary :
> >
> > We think that there is some race condition involved in processing OF
> Controller connections and Packet miss processing in ovs-vswitchd.
> >
> > Reasoning :
> >
> > Please consider the following GDB Debug Session:
> >
> > Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810,
> protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999
> > (gdb) f 2
> > #2  0x0045f586 in connmgr_wants_packet_in_on_miss
> (mgr=0x16a6de0) at ofproto/connmgr.c:1613
> > 1613  enum ofputil_protocol protocol =
> ofconn_get_protocol(ofconn);
> > (gdb) p *ofconn
> > $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash =
> 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type =
> OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true,
> >   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID,
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170,
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0,
> >   controller_id = 0, reply_counter = 0x1673190, master_async_config =
> {3, 7, 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0,
> n_delete = 0, n_modify = 0,
> >   first_op = -9223372036854775808, last_op = -9223372036854775808,
> next_op_report = 9223372036854775807, op_backoff = -9223372036854775808,
> monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0,
> > n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates =
> {prev = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles =
> {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}}
> > (gdb) bt
> > #0  ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD)
> at ofproto/connmgr.c:999
> > #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at
> ofproto/connmgr.c:982
> > #2  0x0045f586 in connmgr_wants_packet_in_on_miss
> (mgr=0x16a6de0) at ofproto/connmgr.c:1613
> > #3  0x00435261 in rule_dpif_lookup_from_table
> (ofproto=0x16a6880, version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0,
> stats=0x0, table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true,
> > honor_table_miss=true) at ofproto/ofproto-dpif.c:3973
> > #4  0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0,
> xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188
> > #5  0x004481b1 in revalidate_ukey (udpif=0x16a7300,
> ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40,
> reval_seq=585728, recircs=0x7f2ace7f8a30)
> > at ofproto/ofproto-dpif-upcall.c:1866
> > #6  0x00448fb2 in revalidate (revalidator=0x1691990) at
> ofproto/ofproto-dpif-upcall.c:2186
> > #7  0x0044593e in udpif_revalidator (arg=0x1691990) at
> ofproto/ofproto-dpif-upcall.c:862
> > #8  0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at
> lib/ovs-thread.c:340
> > #9  0x7f2ad75c2184 in start_thread () from /lib/x86_64-linux-gnu/
> libpthread.so.0
> > #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> > (gdb) f 1
> > #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at
> ofproto/connmgr.c:982
> > 982   ofconn_set_protocol(CONST_CAST(struct ofconn *,
> ofconn),
> > (gdb) l
> > 977   {
> > 978   if (ofconn->protocol == OFPUTIL_P_NONE &&
> > 979   rconn_is_connected(ofconn->rconn)) {
> > 980   int version = rconn_get_version(ofconn->rconn);
> > 981   if (version > 0) {
> > 982   ofconn_set_protocol(CONST_CAST(struct ofconn *,
> ofconn),
> > 983   ofputil_protocol_from_ofp_
> version(version));
> > 984   }
> > 985   }
> > 986
> > (gdb) p *ofconn
> > $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash =
> 0, next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type =
> OFCONN_SERVICE, band = OFPROTO_IN_BAND, enable_async_msgs = true,
> >   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID,
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170,
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0,
> >   controller_id = 0, reply_counter = 

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-17 Thread Ben Pfaff
It would be more helpful to have a simple reproduction case.

Why haven't you tried a newer version from branch-2.5?

On Tue, Jan 17, 2017 at 07:59:05AM -0800, Vidyasagara Guntaka wrote:
> Hi Ben,
> 
> Here i is more debug information related to this incident (still using 
> version 2.5.0):
> 
> Summary :
> 
> We think that there is some race condition involved in processing OF 
> Controller connections and Packet miss processing in ovs-vswitchd.
> 
> Reasoning :
> 
> Please consider the following GDB Debug Session:
> 
> Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, 
> protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999
> (gdb) f 2
> #2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
> ofproto/connmgr.c:1613
> 1613  enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
> (gdb) p *ofconn
> $2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, 
> next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, 
> band = OFPROTO_IN_BAND, enable_async_msgs = true, 
>   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, 
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
>   controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 
> 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 
> 0, n_modify = 0, 
>   first_op = -9223372036854775808, last_op = -9223372036854775808, 
> next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
> monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
> n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev 
> = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = 
> {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}}
> (gdb) bt
> #0  ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at 
> ofproto/connmgr.c:999
> #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
> ofproto/connmgr.c:982
> #2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
> ofproto/connmgr.c:1613
> #3  0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, 
> version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, 
> table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, 
> honor_table_miss=true) at ofproto/ofproto-dpif.c:3973
> #4  0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, 
> xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188
> #5  0x004481b1 in revalidate_ukey (udpif=0x16a7300, 
> ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, 
> reval_seq=585728, recircs=0x7f2ace7f8a30)
> at ofproto/ofproto-dpif-upcall.c:1866
> #6  0x00448fb2 in revalidate (revalidator=0x1691990) at 
> ofproto/ofproto-dpif-upcall.c:2186
> #7  0x0044593e in udpif_revalidator (arg=0x1691990) at 
> ofproto/ofproto-dpif-upcall.c:862
> #8  0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at 
> lib/ovs-thread.c:340
> #9  0x7f2ad75c2184 in start_thread () from 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> (gdb) f 1
> #1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
> ofproto/connmgr.c:982
> 982   ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
> (gdb) l
> 977   {
> 978   if (ofconn->protocol == OFPUTIL_P_NONE &&
> 979   rconn_is_connected(ofconn->rconn)) {
> 980   int version = rconn_get_version(ofconn->rconn);
> 981   if (version > 0) {
> 982   ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
> 983   
> ofputil_protocol_from_ofp_version(version));
> 984   }
> 985   }
> 986   
> (gdb) p *ofconn
> $3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, 
> next = 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, 
> band = OFPROTO_IN_BAND, enable_async_msgs = true, 
>   role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
> packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, 
> schedulers = {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
>   controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 
> 7, 0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 
> 0, n_modify = 0, 
>   first_op = -9223372036854775808, last_op = -9223372036854775808, 
> next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
> monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
> n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev 
> = 0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = 
> {buckets = 0x16d5938, one = 0x0, mask = 0, n = 0}}
> (gdb) p ofconn
> $4 = (const struct ofconn *) 0x16d5810
> (gdb) c
> Continuing.
> [Thread 0x7f2ad79f5980 (LWP 20165) exited]
> 
> 

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-17 Thread Vidyasagara Guntaka via dev
Hi Ben,

Here i is more debug information related to this incident (still using version 
2.5.0):

Summary :

We think that there is some race condition involved in processing OF Controller 
connections and Packet miss processing in ovs-vswitchd.

Reasoning :

Please consider the following GDB Debug Session:

Breakpoint 1, ofconn_set_protocol (ofconn=0x16d5810, 
protocol=OFPUTIL_P_OF10_STD) at ofproto/connmgr.c:999
(gdb) f 2
#2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
ofproto/connmgr.c:1613
1613enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
(gdb) p *ofconn
$2 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next 
= 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = 
OFPROTO_IN_BAND, enable_async_msgs = true, 
  role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers 
= {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
  controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 
0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, 
n_modify = 0, 
  first_op = -9223372036854775808, last_op = -9223372036854775808, 
next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 
0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 
0x16d5938, one = 0x0, mask = 0, n = 0}}
(gdb) bt
#0  ofconn_set_protocol (ofconn=0x16d5810, protocol=OFPUTIL_P_OF10_STD) at 
ofproto/connmgr.c:999
#1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
ofproto/connmgr.c:982
#2  0x0045f586 in connmgr_wants_packet_in_on_miss (mgr=0x16a6de0) at 
ofproto/connmgr.c:1613
#3  0x00435261 in rule_dpif_lookup_from_table (ofproto=0x16a6880, 
version=323, flow=0x7f2ace7f86e8, wc=0x7f2ace7f84b0, stats=0x0, 
table_id=0x7f2ace7f7eda "", in_port=28, may_packet_in=true, 
honor_table_miss=true) at ofproto/ofproto-dpif.c:3973
#4  0x00457ecf in xlate_actions (xin=0x7f2ace7f86e0, 
xout=0x7f2ace7f8010) at ofproto/ofproto-dpif-xlate.c:5188
#5  0x004481b1 in revalidate_ukey (udpif=0x16a7300, 
ukey=0x7f2ab80060e0, stats=0x7f2ace7f94e0, odp_actions=0x7f2ace7f8a40, 
reval_seq=585728, recircs=0x7f2ace7f8a30)
at ofproto/ofproto-dpif-upcall.c:1866
#6  0x00448fb2 in revalidate (revalidator=0x1691990) at 
ofproto/ofproto-dpif-upcall.c:2186
#7  0x0044593e in udpif_revalidator (arg=0x1691990) at 
ofproto/ofproto-dpif-upcall.c:862
#8  0x0050b93d in ovsthread_wrapper (aux_=0x16f4560) at 
lib/ovs-thread.c:340
#9  0x7f2ad75c2184 in start_thread () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#10 0x7f2ad6de137d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) f 1
#1  0x0045e194 in ofconn_get_protocol (ofconn=0x16d5810) at 
ofproto/connmgr.c:982
982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
(gdb) l
977 {
978 if (ofconn->protocol == OFPUTIL_P_NONE &&
979 rconn_is_connected(ofconn->rconn)) {
980 int version = rconn_get_version(ofconn->rconn);
981 if (version > 0) {
982 ofconn_set_protocol(CONST_CAST(struct ofconn *, ofconn),
983 
ofputil_protocol_from_ofp_version(version));
984 }
985 }
986 
(gdb) p *ofconn
$3 = {node = {prev = 0x16a6e18, next = 0x16a6e18}, hmap_node = {hash = 0, next 
= 0x0}, connmgr = 0x16a6de0, rconn = 0x16edb50, type = OFCONN_SERVICE, band = 
OFPROTO_IN_BAND, enable_async_msgs = true, 
  role = OFPCR12_ROLE_EQUAL, protocol = OFPUTIL_P_OF10_STD_TID, 
packet_in_format = NXPIF_OPENFLOW10, packet_in_counter = 0x167a170, schedulers 
= {0x0, 0x0}, pktbuf = 0x0, miss_send_len = 0, 
  controller_id = 0, reply_counter = 0x1673190, master_async_config = {3, 7, 7, 
0, 0, 0}, slave_async_config = {0, 7, 0, 0, 0, 0}, n_add = 0, n_delete = 0, 
n_modify = 0, 
  first_op = -9223372036854775808, last_op = -9223372036854775808, 
next_op_report = 9223372036854775807, op_backoff = -9223372036854775808, 
monitors = {buckets = 0x16d58f0, one = 0x0, mask = 0, 
n = 0}, monitor_paused = 0, monitor_counter = 0x16759f0, updates = {prev = 
0x16d5918, next = 0x16d5918}, sent_abbrev_update = false, bundles = {buckets = 
0x16d5938, one = 0x0, mask = 0, n = 0}}
(gdb) p ofconn
$4 = (const struct ofconn *) 0x16d5810
(gdb) c
Continuing.
[Thread 0x7f2ad79f5980 (LWP 20165) exited]

>From the above GDB Session, ovs-vswitchd is in the middle of processing a 
>packet miss that was read from the data path.
The break point was set inside ofconn_set_protocol inside so that we hit if 
protocol was already set to other than OFPUTIL_P_NONE and now is being set to 
OFPUTIL_P_OF10_STD.
Yes, we modified the code in ofconn_set_protocol with this if 

Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-13 Thread Samuel Jean via dev
Thanks for the quick follow up Ben,

So we'll indeed try against latest versions to rule out the possibility of
a bug that has been fixed already although I could not find any commit with
such mention.  We'll report back here.

At this moment, we can reproduce over and over within minutes.  We've
nailed down -- or at least we think -- to something related to race
condition or memory overwrite between the time connection negotiation
happens and the flow mod packet arrives.  We've been able to run the stress
test for hours when we used --flow-format=NXM+table_id as an argument to
ovs-ofctl but eventually, we've hit the same error.

Sagar is spending more time debugging this issue so maybe he'll be able to
provide more information.

On Fri, Jan 13, 2017 at 1:45 PM, Ben Pfaff  wrote:

> On Thu, Jan 12, 2017 at 03:54:42PM -0500, Samuel Jean via dev wrote:
> > It seems that shelling out to ovs-ofctl very quickly can lead to bug
> where
> > it reports an OFPT_ERROR.
> >
> > We were able to constantly reproduce within minutes of running the above
> > flow modifications on Unbutu.
> >
> > Any help, hints or guidance would be appreciated.  I'd be happy to pursue
> > some debugging that would be required to nail down the issue here.
>
> Thanks for the bug report and especially for the detailed reproduction
> advice.
>
> I've now tried running this reproduction case against Open vSwitch from
> latest master and against the latest versions from the 2.6.x and 2.5.x
> branches, and I can't see any failures even after letting the script run
> for a few minutes.
>
> Maybe you should try 2.5.1 or the latest from branch-2.5 and see if it
> fixes the problem?  And if not, then we'll have to figure out what's
> different between your setup and mine.
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-13 Thread Ben Pfaff
On Thu, Jan 12, 2017 at 03:54:42PM -0500, Samuel Jean via dev wrote:
> It seems that shelling out to ovs-ofctl very quickly can lead to bug where
> it reports an OFPT_ERROR.
> 
> We were able to constantly reproduce within minutes of running the above
> flow modifications on Unbutu.
> 
> Any help, hints or guidance would be appreciated.  I'd be happy to pursue
> some debugging that would be required to nail down the issue here.

Thanks for the bug report and especially for the detailed reproduction
advice.

I've now tried running this reproduction case against Open vSwitch from
latest master and against the latest versions from the 2.6.x and 2.5.x
branches, and I can't see any failures even after letting the script run
for a few minutes.

Maybe you should try 2.5.1 or the latest from branch-2.5 and see if it
fixes the problem?  And if not, then we'll have to figure out what's
different between your setup and mine.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [BUG] ovs-ofctl version 2.5.0 will crash with OFPFMFC_BAD_COMMAND

2017-01-12 Thread Samuel Jean via dev
Hi,

It seems that shelling out to ovs-ofctl very quickly can lead to bug where
it reports an OFPT_ERROR.

We were able to constantly reproduce within minutes of running the above
flow modifications on Unbutu.

Any help, hints or guidance would be appreciated.  I'd be happy to pursue
some debugging that would be required to nail down the issue here.

Best regards,
Sam Jean

# cat ./ofpcrash.sh
#!/bin/sh
ovs-ofctl add-flow br0
'priority=100,table=25,idle_timeout=0,actions=resubmit(,35)' || exit 1
ovs-ofctl add-flow br0
'priority=100,table=35,idle_timeout=0,actions=resubmit(,45)' || exit 1
ovs-ofctl add-flow br0
'priority=100,table=45,idle_timeout=0,actions=resubmit(,50)' || exit 1
ovs-ofctl add-flow br0
'priority=100,table=50,idle_timeout=0,actions=resubmit(,65)' || exit 1
ovs-ofctl add-flow br0
'priority=100,table=65,idle_timeout=0,actions=output:1' || exit 1
ovs-ofctl add-flow br0
'priority=3000,ip,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,table=0,idle_timeout=0,actions=drop'
|| exit 1
ovs-ofctl add-flow br0
'priority=1000,ip,in_port=1,dl_dst=0c:01:00:12:cf:01,table=0,idle_timeout=0,cookie=20250774994944,actions=resubmit(,25)'
|| exit 1

# while true; do ./ofpcrash.sh || break; done
OFPT_ERROR (xid=0x4): OFPFMFC_BAD_COMMAND
OFPT_FLOW_MOD (xid=0x4):
(***truncated to 64 bytes from 88***)
  01 0e 00 58 00 00 00 04-00 38 20 ff 00 00 00 00 |...X.8 .|
0010  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ||
0020  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ||
0030  00 00 00 00 00 00 00 00-32 00 00 00 00 00 00 64 |2..d|

# ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.5.0
Compiled Sep 15 2016 12:55:18
OpenFlow versions 0x1:0x4

# uname -srmvp
Linux 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016
x86_64 x86_64
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev