Re: [ovs-dev] [PATCHv2 1/2] xlate: auto ofproto trace when recursion too deep

2018-02-28 Thread Ben Pfaff
On Wed, Feb 28, 2018 at 03:01:24PM -0800, William Tu wrote:
> On Wed, Feb 28, 2018 at 2:14 PM, Ben Pfaff  wrote:
> > On Mon, Feb 26, 2018 at 01:43:39PM -0800, William Tu wrote:
> >> On Wed, Feb 14, 2018 at 3:03 PM, Ben Pfaff  wrote:
> >> > On Wed, Feb 14, 2018 at 10:56:08AM -0800, William Tu wrote:
> >> >> Usually ofproto/trace is used to debug the flow translation error.
> >> >> When translation error such as recursion too deep or too many resubmit,
> >> >> the issue might happen momentary; flows causing the recursion expire
> >> >> when users try to debug it.  This patch enables the ofproto trace
> >> >> automatically when recursion is too deep or too many resubmit, by
> >> >> invoking the translation again, and log the ofproto trace as warnings.
> >> >> Since the log will be huge, rate limit to one per minute.
> >> >>
> >> >> VMWare-BZ: #2054659
> >> >> Signed-off-by: William Tu 
> >> >> Tested-by: Greg Rose 
> >> >> Reviewed-by: Greg Rose 
> >> >
> >> > Thanks for working on this!
> >> >
> >> > Some of the data passed to ofproto_trace() is also passed to the
> >> > xlate_actions() call, indirectly.  Did you check whether that data is
> >> > possibly modified by xlate_actions()?  If it is, then we might have to
> >> > reconsider this approach, because flow data, etc. is very large and I
> >> > don't think that we can afford to always make a copy of it in advance on
> >> > the chance that the original might be needed for tracing later.
> >>
> >> We pass the "const struct flow *" and "const struct dp_packet *" to
> >> the ofproto_trace(),
> >> so I think these two data is unmodified and xlate_in_init() actually make 
> >> a copy
> >> of the "struct flow", so xlate_actions() modifies a separate copy of the 
> >> flow.
> >
> > OK.
> >
> >> > I think that VLOG_WARN is a very high log level for this data.  I would
> >> > tend to use DBG.
> >> >
> >> I worried that most of the time DBG is not on, and asking people
> >> to turn it on and reproduce the issue is hard.  Can we keep VLOG_WARN and
> >> rate limit to very low frequent, "VLOG_RATE_LIMIT_INIT(1, 5);",?
> >
> > OK, let's try that.
> 
> Thanks Ben.
> Should I send another version of the patch?

Yes please.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 1/2] xlate: auto ofproto trace when recursion too deep

2018-02-28 Thread Ben Pfaff
On Mon, Feb 26, 2018 at 01:43:39PM -0800, William Tu wrote:
> On Wed, Feb 14, 2018 at 3:03 PM, Ben Pfaff  wrote:
> > On Wed, Feb 14, 2018 at 10:56:08AM -0800, William Tu wrote:
> >> Usually ofproto/trace is used to debug the flow translation error.
> >> When translation error such as recursion too deep or too many resubmit,
> >> the issue might happen momentary; flows causing the recursion expire
> >> when users try to debug it.  This patch enables the ofproto trace
> >> automatically when recursion is too deep or too many resubmit, by
> >> invoking the translation again, and log the ofproto trace as warnings.
> >> Since the log will be huge, rate limit to one per minute.
> >>
> >> VMWare-BZ: #2054659
> >> Signed-off-by: William Tu 
> >> Tested-by: Greg Rose 
> >> Reviewed-by: Greg Rose 
> >
> > Thanks for working on this!
> >
> > Some of the data passed to ofproto_trace() is also passed to the
> > xlate_actions() call, indirectly.  Did you check whether that data is
> > possibly modified by xlate_actions()?  If it is, then we might have to
> > reconsider this approach, because flow data, etc. is very large and I
> > don't think that we can afford to always make a copy of it in advance on
> > the chance that the original might be needed for tracing later.
> 
> We pass the "const struct flow *" and "const struct dp_packet *" to
> the ofproto_trace(),
> so I think these two data is unmodified and xlate_in_init() actually make a 
> copy
> of the "struct flow", so xlate_actions() modifies a separate copy of the flow.

OK.

> > I think that VLOG_WARN is a very high log level for this data.  I would
> > tend to use DBG.
> >
> I worried that most of the time DBG is not on, and asking people
> to turn it on and reproduce the issue is hard.  Can we keep VLOG_WARN and
> rate limit to very low frequent, "VLOG_RATE_LIMIT_INIT(1, 5);",?

OK, let's try that.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 1/2] xlate: auto ofproto trace when recursion too deep

2018-02-26 Thread William Tu
On Wed, Feb 14, 2018 at 3:03 PM, Ben Pfaff  wrote:
> On Wed, Feb 14, 2018 at 10:56:08AM -0800, William Tu wrote:
>> Usually ofproto/trace is used to debug the flow translation error.
>> When translation error such as recursion too deep or too many resubmit,
>> the issue might happen momentary; flows causing the recursion expire
>> when users try to debug it.  This patch enables the ofproto trace
>> automatically when recursion is too deep or too many resubmit, by
>> invoking the translation again, and log the ofproto trace as warnings.
>> Since the log will be huge, rate limit to one per minute.
>>
>> VMWare-BZ: #2054659
>> Signed-off-by: William Tu 
>> Tested-by: Greg Rose 
>> Reviewed-by: Greg Rose 
>
> Thanks for working on this!
>
> Some of the data passed to ofproto_trace() is also passed to the
> xlate_actions() call, indirectly.  Did you check whether that data is
> possibly modified by xlate_actions()?  If it is, then we might have to
> reconsider this approach, because flow data, etc. is very large and I
> don't think that we can afford to always make a copy of it in advance on
> the chance that the original might be needed for tracing later.

We pass the "const struct flow *" and "const struct dp_packet *" to
the ofproto_trace(),
so I think these two data is unmodified and xlate_in_init() actually make a copy
of the "struct flow", so xlate_actions() modifies a separate copy of the flow.

>
> I think that VLOG_WARN is a very high log level for this data.  I would
> tend to use DBG.
>
I worried that most of the time DBG is not on, and asking people
to turn it on and reproduce the issue is hard.  Can we keep VLOG_WARN and
rate limit to very low frequent, "VLOG_RATE_LIMIT_INIT(1, 5);",?

Thanks
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2 1/2] xlate: auto ofproto trace when recursion too deep

2018-02-14 Thread Ben Pfaff
On Wed, Feb 14, 2018 at 10:56:08AM -0800, William Tu wrote:
> Usually ofproto/trace is used to debug the flow translation error.
> When translation error such as recursion too deep or too many resubmit,
> the issue might happen momentary; flows causing the recursion expire
> when users try to debug it.  This patch enables the ofproto trace
> automatically when recursion is too deep or too many resubmit, by
> invoking the translation again, and log the ofproto trace as warnings.
> Since the log will be huge, rate limit to one per minute.
> 
> VMWare-BZ: #2054659
> Signed-off-by: William Tu 
> Tested-by: Greg Rose 
> Reviewed-by: Greg Rose 

Thanks for working on this!

Some of the data passed to ofproto_trace() is also passed to the
xlate_actions() call, indirectly.  Did you check whether that data is
possibly modified by xlate_actions()?  If it is, then we might have to
reconsider this approach, because flow data, etc. is very large and I
don't think that we can afford to always make a copy of it in advance on
the chance that the original might be needed for tracing later.

I think that VLOG_WARN is a very high log level for this data.  I would
tend to use DBG.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2 1/2] xlate: auto ofproto trace when recursion too deep

2018-02-14 Thread William Tu
Usually ofproto/trace is used to debug the flow translation error.
When translation error such as recursion too deep or too many resubmit,
the issue might happen momentary; flows causing the recursion expire
when users try to debug it.  This patch enables the ofproto trace
automatically when recursion is too deep or too many resubmit, by
invoking the translation again, and log the ofproto trace as warnings.
Since the log will be huge, rate limit to one per minute.

VMWare-BZ: #2054659
Signed-off-by: William Tu 
Tested-by: Greg Rose 
Reviewed-by: Greg Rose 
---
 ofproto/ofproto-dpif-trace.c  |  7 +--
 ofproto/ofproto-dpif-trace.h  |  6 ++
 ofproto/ofproto-dpif-upcall.c | 22 +-
 tests/ofproto-dpif.at | 25 +
 4 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/ofproto/ofproto-dpif-trace.c b/ofproto/ofproto-dpif-trace.c
index 4071cc0ffd65..7211eac8e1a5 100644
--- a/ofproto/ofproto-dpif-trace.c
+++ b/ofproto/ofproto-dpif-trace.c
@@ -23,11 +23,6 @@
 #include "ofproto-dpif-xlate.h"
 #include "unixctl.h"
 
-static void ofproto_trace(struct ofproto_dpif *, const struct flow *,
-  const struct dp_packet *packet,
-  const struct ofpact[], size_t ofpacts_len,
-  struct ovs_list *next_ct_states,
-  struct ds *);
 static void oftrace_node_destroy(struct oftrace_node *);
 
 /* Creates a new oftrace_node, populates it with the given 'type' and a copy of
@@ -661,7 +656,7 @@ ofproto_trace__(struct ofproto_dpif *ofproto, const struct 
flow *flow,
  *
  * If 'ofpacts' is nonnull then its 'ofpacts_len' bytes specify the actions to
  * trace, otherwise the actions are determined by a flow table lookup. */
-static void
+void
 ofproto_trace(struct ofproto_dpif *ofproto, const struct flow *flow,
   const struct dp_packet *packet,
   const struct ofpact ofpacts[], size_t ofpacts_len,
diff --git a/ofproto/ofproto-dpif-trace.h b/ofproto/ofproto-dpif-trace.h
index 5e51771b19b0..ea6cb3fc01ac 100644
--- a/ofproto/ofproto-dpif-trace.h
+++ b/ofproto/ofproto-dpif-trace.h
@@ -28,6 +28,8 @@
  * each action (OFT_ACTION) executed in the table.
  */
 
+#include "openvswitch/dynamic-string.h"
+#include "ofproto/ofproto-dpif.h"
 #include "openvswitch/compiler.h"
 #include "openvswitch/list.h"
 #include "flow.h"
@@ -79,6 +81,10 @@ struct oftrace_next_ct_state {
 };
 
 void ofproto_dpif_trace_init(void);
+void ofproto_trace(struct ofproto_dpif *ofproto, const struct flow *flow,
+  const struct dp_packet *packet,
+  const struct ofpact *, size_t ofpacts_len,
+  struct ovs_list *next_ct_states, struct ds *output);
 
 struct oftrace_node *oftrace_report(struct ovs_list *, enum oftrace_node_type,
 const char *text);
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 5eb20f7cc236..1741741fb5c5 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -34,6 +34,7 @@
 #include "ofproto-dpif-sflow.h"
 #include "ofproto-dpif-xlate.h"
 #include "ofproto-dpif-xlate-cache.h"
+#include "ofproto-dpif-trace.h"
 #include "ovs-rcu.h"
 #include "packets.h"
 #include "openvswitch/poll-loop.h"
@@ -1130,7 +1131,9 @@ upcall_xlate(struct udpif *udpif, struct upcall *upcall,
  struct ofpbuf *odp_actions, struct flow_wildcards *wc)
 {
 struct dpif_flow_stats stats;
+enum xlate_error xerr;
 struct xlate_in xin;
+struct ds output;
 
 stats.n_packets = 1;
 stats.n_bytes = dp_packet_size(upcall->packet);
@@ -1165,7 +1168,24 @@ upcall_xlate(struct udpif *udpif, struct upcall *upcall,
 upcall->dump_seq = seq_read(udpif->dump_seq);
 upcall->reval_seq = seq_read(udpif->reval_seq);
 
-xlate_actions(, >xout);
+xerr = xlate_actions(, >xout);
+
+/* Translate again and log the ofproto trace for
+ * these two error types. */
+if (xerr == XLATE_RECURSION_TOO_DEEP ||
+xerr == XLATE_TOO_MANY_RESUBMITS) {
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
+
+/* This is a huge log, so be conservative. */
+if (!VLOG_DROP_WARN()) {
+ds_init();
+ofproto_trace(upcall->ofproto, upcall->flow,
+  upcall->packet, NULL, 0, NULL, );
+VLOG_WARN("%s", ds_cstr());
+ds_destroy();
+}
+}
+
 if (wc) {
 /* Convert the input port wildcard from OFP to ODP format. There's no
  * real way to do this for arbitrary bitmasks since the numbering 
spaces
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 600afdda8528..fd78c5d9b593 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -8473,6 +8473,31 @@ AT_CHECK([grep -c 'over max translation depth 64' 
stdout],
 OVS_VSWITCHD_STOP(["/resubmit actions