Hi Ben,

Thank you for reviewing the patch. Please see inline for comments.

-----Original Message-----
From: Ben Pfaff [mailto:[email protected]] 
Sent: Tuesday, December 05, 2017 12:52 AM
To: Vishal Deep Ajmera <[email protected]>
Cc: [email protected]
Subject: Re: [ovs-dev] [PATCH] ofproto-dpif-xlate: Incorrect handling of errors 
in group action processing

On Mon, Dec 04, 2017 at 08:47:38AM +0000, Vishal Deep Ajmera wrote:
> As per OpenFlow v1.3 specification, when an action list contains a 
> group action a copy of the packet is passed to the group for 
> processing by the group. This means that if there is an error 
> encountered during group processing, only the copy of packet should be 
> dropped, but subsequent actions in the action list should be executed on the 
> original packet.
> 
> Additionally, if the group type is "ALL", each action bucket of the 
> group should process a copy of the packet. If there is an error while 
> processing one bucket other buckets should still be processed.
> 
> Example 1:
> table=0,in_port=tap0 actions=output:tap1,group:10,output:tap2
> 
> Even if any error is encountered while processing the group action, 
> the packet should still be forwarded to ports tap1 and tap2.
> 
> Example 2:
> group_id=1,type=all,bucket=actions=output:tap1,bucket=actions=encap(et
> h)
> 
> Even if processing the action in the second bucket fails because the 
> packet already has an Ethernet header, the other copy of the packet 
> should still be processed by the first bucket and output to port tap1.
> 
> Currently the error handling in OVS does not comply with those rules. 
> When any group bucket execution fails the error is recorded in the 
> so-called "translation context" which is global for the processing of 
> the original packet. Once an error is recorded, OVS skips processing 
> subsequent buckets and installs a drop action in the datapath even if 
> parts of the action list were previously processed successfully.
> 
> This patch clears the error flag after any bucket of a group is executed.
> This way the error encountered in processing any bucket of the group 
> will not impact other actions of action-list or other buckets of the group.
> 
> Signed-off-by: Vishal Deep Ajmera <[email protected]>
> Signed-off-by: Keshav Gupta <[email protected]>

Thank you for thinking about this issue.  Clearly there is something to be done 
here.

When I look at the uses of translation errors, I see now that there are a few 
categories:

   1. Translation can't start at all: XLATE_NO_RECIRCULATION_CONTEXT,
      XLATE_BRIDGE_NOT_FOUND, XLATE_RECIRCULATION_CONFLICT,
      XLATE_INVALID_TUNNEL_METADATA.

   2. Translation has run too long or used too much space and must end
      to bound memory or CPU time.  XLATE_RECURSION_TOO_DEEP and
      XLATE_TOO_MANY_RESUBMITS are in this category for time,
      XLATE_STACK_TOO_DEEP for space.

   3. Some step can't be executed: XLATE_TOO_MANY_MPLS_LABELS,
      XLATE_UNSUPPORTED_PACKET_TYPE.

I believe that the appropriate handling varies among category.  Category
1 is fundamentally unrecoverable.  For category 2, we must not try to recover 
because that would extend CPU time (perhaps exponentially) or it would cause 
malfunctions (because a later "pop" would miss out on some data that wasn't 
"push"ed).  For category 3, we could recover at an appropriate point in the way 
you suggest.
=======
[Vishal/Keshav] Any packet modification done in group bucket processing is 
local to bucket only and original packet is retained (as it was before entering 
the group processing) after bucket is executed. OVS takes care of this by 
restoring the flow back again in ctx->flow in xlate_group_bucket ().  Now if 
there are errors in any action in the bucket then the bucket processing will 
halt for the packet. Also any changes to the ctx->flow done in the bucket will 
not be committed in ctx->odp_actions. So, if we reset the ctx->error back to 
old value (which must be 0) we still will be able to process further actions in 
the action-list. 

Is our understanding correct ? Or are we missing some case which can 
potentially break because of this change ? We think that 
XLATE_RECURSION_TOO_DEEP and XLATE_TOO_MANY_RESUBMITS should be treated local 
for group processing. Ideally we should also be restoring the ctx->resubmits to 
old value after bucket processing is complete to make sure one bucket does not 
disrupt processing (exhaust number of resubmits) of other buckets or other 
actions of the action-list. ctx->depth parameter is already adjusted to 
original value after bucket processing is done in xlate_group_bucket ().

We agree that for cases where errors are global in nature and can cause data 
path issues should not be masked for e.g. XLATE_STACK_TOO_DEEP. Is there any 
more such errors we should not mask ?
=======

I think that the patch should be rewritten to make these more fine grained 
distinctions.  I don't think that category 1 needs special handling because 
this code should never see such an error, but it seems important to distinguish 
categories 2 and 3.

Thanks,

Ben.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to