On Thu, Nov 6, 2025 at 11:34 AM Aaron Conole via dev <
[email protected]> wrote:

> Ilya Maximets <[email protected]> writes:
>
> > This change attempts to track the actual available stack and stop
> > early if the flow translation logic is about to overflow it.
> >
> > Unlike the recursion depth counter, this approach allows to track the
> > actual stack usage and bail early even if the recursion depth is not
> > reached yet.  This is important because different actions have vastly
> > different stack requirements and different systems have different
> > amount of stack allocated per thread by default.
>
> This is an interesting approach, for some definitions of interesting :)
>
> As you note a bit below, it really depends on a lot - namely that the
> code was built in such a way that getting the frame address works via
> the built-in (I'm thinking about the weirdness of alpha architecture and
> -fomit-frame-pointer), and that the compiler can support giving this
> information.
>
> > Should work with both GCC and Clang, will likely not work on Windows.
> > The change should have no effect on platforms / compilers that do not
> > support checking the current stack frame address via builtins.
> >
> > The main thread is not treated fairly.  At least on Linux the main
> > thread can grow its stack if the limit is dynamically increased.
> > That is not normally true for other threads.  However, this patch
> > sticks to initial stack size even for the main thread.  This should
> > not be a problem for OVS though, as vast majority of all the packet
> > processing is normally done by handlers, revalidators or PMD threads.
> >
> > Unlike the previous RFC version of this change [1], we're not trying
> > to work around the stack limits by recirculating packets through the
> > datapath.  Practice shows that such techniques may lead to self-DoS,
> > overwhelming OVS with upcalls.  See the ovn-controller self-DoS issue
> > fixed recently in OVN:
> >
> https://mail.openvswitch.org/pipermail/ovs-dev/2025-October/426746.html
> > So, it's safer to just drop the packets and only execute actions that
> > we can translate safely.  All in all, stack exhaustion usually means
> > a loop or otherwise a very inefficient OpenFlow pipeline that should
> > be fixed by the users.  Attempts to process the whole thing would only
> > mask the problem.
> >
> > It seems hard to create a unit test for this, as support for measuring
> > the actual stack depth as well as the amount of space each stack frame
> > takes and the ways to limit the stack largely depend on a platform and
> > a compiler.  Can be tested manually with an infinite resubmit case:
>
> Although it would be more work, maybe creating our own stack structure
> and iterating, while maintaining a separate agnostic stack would make
> sense.  Something like::
>
>  #define XLATE_MAX_OPERATION_DEPTH 64
>
>  struct xlate_operation {
>         cur_state;
>         ofpacts;
>         ofpacts_Len;
>         flow;
>         port;
>         recirc;
>  };
>
>  struct xlate_op_stack {
>         top_ptr;
>         struct xlate_operation frames[XLATE_MAX_OPERATION_DEPTH];
>  };
>
> That does require rewriting things like xlate_recursively,
> do_xlate_actions, etc. to indicate that they need a new operation, and
> set the appropriate state so that a while loop can call the right piece.
>
> BUT - I fully concede it's much more work than this approach, and at
> least this approach gives an early bail out *now* rather than major
> churn associated with reworking the xlate pipeline.
>

I agree with this. We keep on running into recursion issues, evaluating
actions iteratively would solve many problems. However this would be a
significant effort. One problem I have been thinking about is how a
branching operation like xlate_check_pkt_larger would be handled.


>
> >   make -j8
> >   (ulimit -s 386; make sandbox)
> >   ovs-vsctl add-br br0
> >   ovs-vsctl add-port br0 p1
> >   ovs-ofctl del-flows br0
> >   (for i in $(seq 0 64); do
> >       j=$(expr $i + 1);
> >       echo "table=$i,
> actions=local,resubmit(,$j),local,resubmit(,$j),local";
> >    done;
> >    echo "table=65, actions=resubmit(,0)") > ./resubmits.txt
> >   ovs-ofctl add-flows br0 ./resubmits.txt
> >   ovs-appctl ofproto/trace br0 'in_port=1' > ./trace.txt
> >
> > [1]
> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/412048.html
> >
> > Signed-off-by: Ilya Maximets <[email protected]>
> > ---
> >  include/linux/openvswitch.h    |  1 +
> >  include/openvswitch/compiler.h | 11 ++++++++
> >  lib/odp-execute.c              |  4 +++
> >  lib/ovs-thread.c               | 47 ++++++++++++++++++++++++++++++++++
> >  lib/ovs-thread.h               |  8 ++++++
> >  lib/sat-math.h                 |  5 +---
> >  ofproto/ofproto-dpif-xlate.c   | 10 ++++++--
> >  vswitchd/ovs-vswitchd.c        |  1 +
> >  8 files changed, 81 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
> > index e69fed4b7..8e0b9ad46 100644
> > --- a/include/linux/openvswitch.h
> > +++ b/include/linux/openvswitch.h
> > @@ -475,6 +475,7 @@ enum xlate_error {
> >       XLATE_TUNNEL_OUTPUT_NO_ETHERNET,
> >       XLATE_TUNNEL_NEIGH_CACHE_MISS,
> >       XLATE_TUNNEL_HEADER_BUILD_FAILED,
> > +     XLATE_THREAD_STACK_EXHAUSTED,
> >       XLATE_MAX,
> >  };
> >  #endif
> > diff --git a/include/openvswitch/compiler.h
> b/include/openvswitch/compiler.h
> > index bd30369a7..352980e19 100644
> > --- a/include/openvswitch/compiler.h
> > +++ b/include/openvswitch/compiler.h
> > @@ -29,6 +29,9 @@
> >  #ifndef __has_attribute
> >    #define __has_attribute(x) 0
> >  #endif
> > +#ifndef __has_builtin
> > +  #define __has_builtin(x) 0
> > +#endif
> >
> >  /* To make OVS_NO_RETURN portable across gcc/clang and MSVC, it should
> be
> >   * added at the beginning of the function declaration. */
> > @@ -317,5 +320,13 @@
> >  #define BUILD_ASSERT_DECL_GCCONLY(EXPR) ((void) 0)
> >  #endif
> >
> > +/* OVS_FRAME_ADDRESS can be used to get the address of the current
> stack frame.
> > + * Note: Attempts to get address of any frame beside the current one
> (0) are
> > + * dangerous and can lead to crashes according to GCC documentation.  */
> > +#if __has_builtin(__builtin_frame_address)
> > +#define OVS_FRAME_ADDRESS() ((char *) __builtin_frame_address(0))
> > +#else
> > +#define OVS_FRAME_ADDRESS() ((char *) 0)
> > +#endif
> >
> >  #endif /* compiler.h */
> > diff --git a/lib/odp-execute.c b/lib/odp-execute.c
> > index ecbda8c01..2271e7d72 100644
> > --- a/lib/odp-execute.c
> > +++ b/lib/odp-execute.c
> > @@ -60,6 +60,7 @@ COVERAGE_DEFINE(drop_action_tunnel_routing_failed);
> >  COVERAGE_DEFINE(drop_action_tunnel_output_no_ethernet);
> >  COVERAGE_DEFINE(drop_action_tunnel_neigh_cache_miss);
> >  COVERAGE_DEFINE(drop_action_tunnel_header_build_failed);
> > +COVERAGE_DEFINE(drop_action_thread_stack_exhausted);
> >
> >  static void
> >  dp_update_drop_action_counter(enum xlate_error drop_reason,
> > @@ -116,6 +117,9 @@ dp_update_drop_action_counter(enum xlate_error
> drop_reason,
> >     case XLATE_TUNNEL_HEADER_BUILD_FAILED:
> >          COVERAGE_ADD(drop_action_tunnel_header_build_failed, delta);
> >          break;
> > +   case XLATE_THREAD_STACK_EXHAUSTED:
> > +        COVERAGE_ADD(drop_action_thread_stack_exhausted, delta);
> > +        break;
> >     case XLATE_MAX:
> >     default:
> >          VLOG_ERR_RL(&rl, "Invalid Drop reason type: %d", drop_reason);
> > diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
> > index 243791de8..a390ed4f5 100644
> > --- a/lib/ovs-thread.c
> > +++ b/lib/ovs-thread.c
> > @@ -404,6 +404,52 @@ ovsthread_id_init(void)
> >      return *ovsthread_id_get() = atomic_count_inc(&next_id);
> >  }
> >
> > +DEFINE_STATIC_PER_THREAD_DATA(uintptr_t, ovsthread_stack_base, 0);
> > +DEFINE_STATIC_PER_THREAD_DATA(size_t, ovsthread_stack_size, 0);
> > +
> > +void
> > +ovsthread_stack_init(void)
> > +{
> > +    pthread_attr_t attr;
> > +    size_t stacksize;
> > +    int error;
> > +
> > +    ovs_assert(*ovsthread_stack_base_get() == 0);
> > +    *ovsthread_stack_base_get() = (uintptr_t) OVS_FRAME_ADDRESS();
> > +
> > +    pthread_attr_init(&attr);
>
> Since we're already in the realm of needing special support and
> non-portable stuff, maybe something like pthread_getattr_np() to get the
> actual configuration rather than whatever is in the default thread attr?
> We can always fall to pthread_attr_init in cases where that call doesn't
> exist.
>

I agree that pthread_getattr_np and getting the actual size seems more
correct here. Musl supports this function and the BSD pthread
implementation has the similar pthread_attr_get_np.


>
> We could also pass a flag that will use getrlimit(RLIMIT_STACK) for the
> main thread, and that should be 'close' to accurate.
>
> > +
> > +    error = pthread_attr_getstacksize(&attr, &stacksize);
> > +    if (error) {
> > +        VLOG_ABORT("pthread_attr_getstacksize failed: %s",
> > +                   ovs_strerror(error));
> > +    }
> > +    *ovsthread_stack_size_get() = stacksize;
> > +    pthread_attr_destroy(&attr);
> > +}
> > +
> > +bool
> > +ovsthread_low_on_stack(void)
> > +{
> > +    uintptr_t curr = (uintptr_t) OVS_FRAME_ADDRESS();
> > +    uintptr_t base = *ovsthread_stack_base_get();
> > +    size_t size = *ovsthread_stack_size_get();
> > +    size_t used;
> > +
> > +    if (!curr || !base || !size) {
> > +        /* Either not initialized or not supported. */
> > +        return false;
> > +    }
> > +
> > +    used = (base > curr) ? base - curr : curr - base;
> > +
> > +    /* Consider 'low' as less than a 100 KB. */
> > +    if (size < used + 100 * 1024) {
>

How much stack does a resubmit usually take? 100KB seems somewhat arbitrary
and high.


Thank you,
M


> > +        return true;
> > +    }
> > +    return false;
> > +}
> > +
> >  static void *
> >  ovsthread_wrapper(void *aux_)
> >  {
> > @@ -412,6 +458,7 @@ ovsthread_wrapper(void *aux_)
> >      unsigned int id;
> >
> >      id = ovsthread_id_init();
> > +    ovsthread_stack_init();
> >
> >      aux = *auxp;
> >      free(auxp);
> > diff --git a/lib/ovs-thread.h b/lib/ovs-thread.h
> > index d37f4ffc4..55c9859b9 100644
> > --- a/lib/ovs-thread.h
> > +++ b/lib/ovs-thread.h
> > @@ -474,6 +474,14 @@ ovsthread_id_self(void)
> >      return id;
> >  }
> >
> > +/* Enables use of ovsthread_low_on_stack().  Must be called by a new
> thread
> > + * early after the thread creation, e.g. called from
> ovs_thread_create(). */
> > +void ovsthread_stack_init(void);
> > +
> > +/* Returns 'true' if the current thread used up most of its stack space.
> > + * Can be used, for example, to check if recursion has to be stopped. */
> > +bool ovsthread_low_on_stack(void);
> > +
> >  /* Simulated global counter.
> >   *
> >   * Incrementing such a counter is meant to be cheaper than incrementing
> a
> > diff --git a/lib/sat-math.h b/lib/sat-math.h
> > index d66872387..34b939277 100644
> > --- a/lib/sat-math.h
> > +++ b/lib/sat-math.h
> > @@ -18,12 +18,9 @@
> >  #define SAT_MATH_H 1
> >
> >  #include <limits.h>
> > +#include "openvswitch/compiler.h"
> >  #include "openvswitch/util.h"
> >
> > -#ifndef __has_builtin
> > -#define __has_builtin(x) 0
> > -#endif
> > -
> >  /* Returns x + y, clamping out-of-range results into the range of the
> return
> >   * type. */
> >  static inline unsigned int
> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> > index 2c8197fb7..1af2c4118 100644
> > --- a/ofproto/ofproto-dpif-xlate.c
> > +++ b/ofproto/ofproto-dpif-xlate.c
> > @@ -463,6 +463,8 @@ const char *xlate_strerror(enum xlate_error error)
> >          return "Tunnel neighbor cache miss";
> >      case XLATE_TUNNEL_HEADER_BUILD_FAILED:
> >          return "Native tunnel header build failed";
> > +    case XLATE_THREAD_STACK_EXHAUSTED:
> > +        return "Thread stack exhausted";
> >      case XLATE_MAX:
> >          break;
> >      }
> > @@ -4702,6 +4704,9 @@ xlate_resubmit_resource_check(struct xlate_ctx
> *ctx)
> >      } else if (ctx->stack.size >= 65536) {
> >          xlate_report_error(ctx, "resubmits yielded over 64 kB of
> stack");
> >          ctx->error = XLATE_STACK_TOO_DEEP;
> > +    } else if (ovsthread_low_on_stack()) {
> > +        xlate_report_error(ctx, "thread stack exhausted.");
> > +        ctx->error = XLATE_THREAD_STACK_EXHAUSTED;
>
> I am a bit confused why we create a new stack exhaustion xlate error
> here.  Any reason not to reuse the STACK_TOO_DEEP error case?  Is it
> because the specifics of the error report?
>
> >      } else {
> >          return true;
> >      }
> > @@ -4906,8 +4911,9 @@ xlate_group_bucket(struct xlate_ctx *ctx, struct
> ofputil_bucket *bucket,
> >       *
> >       * Exception to above is errors which are system limits to protect
> >       * translation from running too long or occupy too much space.
> These errors
> > -     * should not be masked. XLATE_RECURSION_TOO_DEEP,
> XLATE_TOO_MANY_RESUBMITS
> > -     * and XLATE_STACK_TOO_DEEP fall in this category. */
> > +     * should not be masked. Following errors fall in this category:
> > +     * XLATE_RECURSION_TOO_DEEP, XLATE_TOO_MANY_RESUBMITS,
> > +     * XLATE_STACK_TOO_DEEP and XLATE_THREAD_STACK_EXHAUSTED. */
> >      if (ctx->error == XLATE_TOO_MANY_MPLS_LABELS ||
> >          ctx->error == XLATE_UNSUPPORTED_PACKET_TYPE) {
> >          /* reset the error and continue processing other buckets */
> > diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c
> > index 6d90c73b8..99a090c15 100644
> > --- a/vswitchd/ovs-vswitchd.c
> > +++ b/vswitchd/ovs-vswitchd.c
> > @@ -94,6 +94,7 @@ main(int argc, char *argv[])
> >      fatal_ignore_sigpipe();
> >
> >      daemonize_start(true, hw_rawio_access);
> > +    ovsthread_stack_init();
> >
> >      if (want_mlockall) {
> >  #ifdef HAVE_MLOCKALL
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to