On Mon, Dec 04, 2017 at 08:47:38AM +0000, Vishal Deep Ajmera wrote:
> As per OpenFlow v1.3 specification, when an action list contains a group
> action a copy of the packet is passed to the group for processing by the
> group. This means that if there is an error encountered during group
> processing, only the copy of packet should be dropped, but subsequent
> actions in the action list should be executed on the original packet.
> 
> Additionally, if the group type is "ALL", each action bucket of the group
> should process a copy of the packet. If there is an error while processing
> one bucket other buckets should still be processed.
> 
> Example 1:
> table=0,in_port=tap0 actions=output:tap1,group:10,output:tap2
> 
> Even if any error is encountered while processing the group action, the
> packet should still be forwarded to ports tap1 and tap2.
> 
> Example 2:
> group_id=1,type=all,bucket=actions=output:tap1,bucket=actions=encap(eth)
> 
> Even if processing the action in the second bucket fails because the
> packet already has an Ethernet header, the other copy of the packet should
> still be processed by the first bucket and output to port tap1.
> 
> Currently the error handling in OVS does not comply with those rules. When
> any group bucket execution fails the error is recorded in the so-called
> "translation context" which is global for the processing of the original
> packet. Once an error is recorded, OVS skips processing subsequent buckets
> and installs a drop action in the datapath even if parts of the action list
> were previously processed successfully.
> 
> This patch clears the error flag after any bucket of a group is executed.
> This way the error encountered in processing any bucket of the group will
> not impact other actions of action-list or other buckets of the group.
> 
> Signed-off-by: Vishal Deep Ajmera <[email protected]>
> Signed-off-by: Keshav Gupta <[email protected]>

Thank you for thinking about this issue.  Clearly there is something to
be done here.

When I look at the uses of translation errors, I see now that there are
a few categories:

   1. Translation can't start at all: XLATE_NO_RECIRCULATION_CONTEXT,
      XLATE_BRIDGE_NOT_FOUND, XLATE_RECIRCULATION_CONFLICT,
      XLATE_INVALID_TUNNEL_METADATA.

   2. Translation has run too long or used too much space and must end
      to bound memory or CPU time.  XLATE_RECURSION_TOO_DEEP and
      XLATE_TOO_MANY_RESUBMITS are in this category for time,
      XLATE_STACK_TOO_DEEP for space.

   3. Some step can't be executed: XLATE_TOO_MANY_MPLS_LABELS,
      XLATE_UNSUPPORTED_PACKET_TYPE.

I believe that the appropriate handling varies among category.  Category
1 is fundamentally unrecoverable.  For category 2, we must not try to
recover because that would extend CPU time (perhaps exponentially) or it
would cause malfunctions (because a later "pop" would miss out on some
data that wasn't "push"ed).  For category 3, we could recover at an
appropriate point in the way you suggest.

I think that the patch should be rewritten to make these more fine
grained distinctions.  I don't think that category 1 needs special
handling because this code should never see such an error, but it seems
important to distinguish categories 2 and 3.

Thanks,

Ben.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to