The branch, master has been updated
       via  83f4b51175c s3:selftest: update aio_ratelimit tests for burst 
support
       via  9b54d8beaef docs-xml/manpages: update doc to add burst_mult 
parameters
       via  d6332b2caf0 vfs_aio_ratelimit: support human-readable bandwidth 
limits
       via  306612e09c0 vfs_aio_ratelimit: Add per-share TDB persistence for 
local rate limiter state
       via  f6a67c361bc vfs_aio_ratelimit: introduce burst-aware token bucket 
model
      from  31f3bc19d5a quic_ko_wrapper: Fix a typo

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 83f4b51175cdaa20039de7e823bc4c6a15893628
Author: Avan Thakkar <[email protected]>
Date:   Thu Jan 22 21:48:37 2026 +0530

    s3:selftest: update aio_ratelimit tests for burst support
    
    - Replace delay_max configuration with burst_mult parameters.
    - Add three test cases: basic rate limiting, burst behavior, and recovery
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
    
    Signed-off-by: Avan Thakkar <[email protected]>
    Reviewed-by: Shweta Sodani <[email protected]>
    Reviewed-by: Shachar Sharon <[email protected]>
    Reviewed-by: Guenther Deschner <[email protected]>
    Reviewed-by: Anoop C S <[email protected]>
    
    Autobuild-User(master): Günther Deschner <[email protected]>
    Autobuild-Date(master): Fri Feb 27 11:52:46 UTC 2026 on atb-devel-224

commit 9b54d8beaefd9b835b971dd0370d3a1f198121d8
Author: Avan Thakkar <[email protected]>
Date:   Thu Jan 22 21:13:57 2026 +0530

    docs-xml/manpages: update doc to add burst_mult parameters
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
    
    Signed-off-by: Avan Thakkar <[email protected]>
    Reviewed-by: Shweta Sodani <[email protected]>
    Reviewed-by: Shachar Sharon <[email protected]>
    Reviewed-by: Guenther Deschner <[email protected]>
    Reviewed-by: Anoop C S <[email protected]>

commit d6332b2caf03e4fcbed0a67208251a27601c527d
Author: Avan Thakkar <[email protected]>
Date:   Thu Jan 22 20:28:16 2026 +0530

    vfs_aio_ratelimit: support human-readable bandwidth limits
    
    Allow read_bw_limit and write_bw_limit to be specified using
    size suffixes (K/M/G/T).
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
    
    Signed-off-by: Avan Thakkar <[email protected]>
    Reviewed-by: Shweta Sodani <[email protected]>
    Reviewed-by: Shachar Sharon <[email protected]>
    Reviewed-by: Guenther Deschner <[email protected]>
    Reviewed-by: Anoop C S <[email protected]>

commit 306612e09c082282f39789c426ea85cc7e2bb6e3
Author: Avan Thakkar <[email protected]>
Date:   Tue Dec 2 14:20:42 2025 +0530

    vfs_aio_ratelimit: Add per-share TDB persistence for local rate limiter 
state
    
    Introduce local TDB storage for saving and restoring ratelimiter state
    (iops_tokens, bytes_tokens, last timestamp). Each share now persists
    its read/write limiter state under aio_ratelimit.tdb.
    
    Added VERSION pseudo-key for schema versioning
    
    On disconnect, save the latest state and close TDB.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
    
    Signed-off-by: Avan Thakkar <[email protected]>
    Reviewed-by: Shweta Sodani <[email protected]>
    Reviewed-by: Shachar Sharon <[email protected]>
    Reviewed-by: Guenther Deschner <[email protected]>
    Reviewed-by: Anoop C S <[email protected]>

commit f6a67c361bcb0d9f4a7f451dcfda800775b5be13
Author: Avan Thakkar <[email protected]>
Date:   Mon Dec 1 18:04:54 2025 +0530

    vfs_aio_ratelimit: introduce burst-aware token bucket model
    
    Refactor the rate limiter to use a continuous token-bucket model with
    configurable burst multiplier. This replaces the older time-window and
    delay_max logic.
    
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
    
    Signed-off-by: Avan Thakkar <[email protected]>
    Reviewed-by: Shweta Sodani <[email protected]>
    Reviewed-by: Shachar Sharon <[email protected]>
    Reviewed-by: Guenther Deschner <[email protected]>
    Reviewed-by: Anoop C S <[email protected]>

-----------------------------------------------------------------------

Summary of changes:
 docs-xml/manpages/vfs_aio_ratelimit.8.xml  |  76 +++-
 lib/replace/replace.h                      |   8 +
 selftest/target/Samba3.pm                  |   8 +-
 source3/modules/vfs_aio_ratelimit.c        | 706 +++++++++++++++++++----------
 source3/script/tests/test_aio_ratelimit.sh | 139 +++++-
 5 files changed, 681 insertions(+), 256 deletions(-)


Changeset truncated at 500 lines:

diff --git a/docs-xml/manpages/vfs_aio_ratelimit.8.xml 
b/docs-xml/manpages/vfs_aio_ratelimit.8.xml
index 43d3e695c08..94f470cc857 100644
--- a/docs-xml/manpages/vfs_aio_ratelimit.8.xml
+++ b/docs-xml/manpages/vfs_aio_ratelimit.8.xml
@@ -32,11 +32,19 @@
        rate-limiting on specific shares by enforcing upper limit on async I/O
        operations. An administrator may define this limit as operations
        per-second or bytes-per-second. When one of those limits is exceeded,
-       a delay value (in milliseconds) is calculated based on current I/O load
+       a delay value (in microseconds) is calculated based on current I/O load
        and injected to async I/O operations, yielding an implicit throughput
        ceiling.
        </para>
 
+       <para>
+       A configurable burst allowance is supported via a burst multiplier,
+       allowing short-term bursts above the steady-state rate while still
+       enforcing a long-term ceiling. Rate-limiter state is periodically
+       persisted to a local TDB, allowing limits to be enforced consistently
+       across client reconnects and smbd restarts.
+       </para>
+
        <para>
        This module operates only on asynchronous VFS READ/WRITE operation.
        </para>
@@ -79,24 +87,27 @@
                <para>
                        Upper limit of READ bandwidth (bytes-per-second) before
                        injecting delays. Zero value implies no limit.
+                       Supports size suffixes (K, M, G, T).
                </para>
                <para>Default: 0, Max: 1T</para>
-               <para>Example: aio_ratelimit:read_bw_limit = 1000000</para>
+               <para>Example: aio_ratelimit:read_bw_limit = 2M</para>
                </listitem>
                </varlistentry>
 
                <varlistentry>
-               <term>aio_ratelimit:read_delay_max = seconds</term>
+               <term>aio_ratelimit:read_burst_mult = value</term>
                <listitem>
                <para>
-                       Maximal allowed delay value, in seconds, for READ.
+                       Burst multiplier for READ operations, expressed in
+                       tenths (e.g., 15 = 1.5x). Defines the token bucket
+                       capacity as a multiple of the rate limit, allowing
+                       short-term bursts above the steady-state rate.
                </para>
-               <para>Default: 30, Max: 300</para>
-               <para>Example: aio_ratelimit:read_delay_max = 15</para>
+               <para>Default: 15 (1.5x), Max: 100 (10x)</para>
+               <para>Example: aio_ratelimit:read_burst_mult = 20</para>
                </listitem>
                </varlistentry>
 
-
                <varlistentry>
                <term>aio_ratelimit:write_iops_limit = count</term>
                <listitem>
@@ -115,26 +126,67 @@
                <para>
                        Upper limit of WRITE bandwidth (bytes-per-second)
                        before injecting delays. Zero value implies no limit.
+                       Supports size suffixes (K, M, G, T).
                </para>
                <para>Default: 0, Max: 1T</para>
-               <para>Example: aio_ratelimit:write_bw_limit = 1000000</para>
+               <para>Example: aio_ratelimit:write_bw_limit = 1M</para>
                </listitem>
                </varlistentry>
 
                <varlistentry>
-               <term>aio_ratelimit:write_delay_max = seconds</term>
+               <term>aio_ratelimit:write_burst_mult = value</term>
                <listitem>
                <para>
-                       Maximal allowed delay value, in seconds, for WRITE.
+                       Burst multiplier for WRITE operations, expressed in
+                       tenths (e.g., 15 = 1.5x). Defines the token bucket
+                       capacity as a multiple of the rate limit, allowing
+                       short-term bursts above the steady-state rate.
                </para>
-               <para>Default: 30, Max: 300</para>
-               <para>Example: aio_ratelimit:write_delay_max = 20</para>
+               <para>Default: 15 (1.5x), Max: 100 (10x)</para>
+               <para>Example: aio_ratelimit:write_burst_mult = 15</para>
                </listitem>
                </varlistentry>
 
        </variablelist>
 </refsect1>
 
+<refsect1>
+       <title>BURST BEHAVIOR</title>
+
+       <para>
+       The <command>read_burst_mult</command> and 
<command>write_burst_mult</command>
+       parameters control the maximum burst capacity of the rate limiter 
relative to
+       the configured rate limits. The effective burst capacity is calculated 
as:
+       <emphasis>rate_limit * (burst_mult / 10)</emphasis>.
+       </para>
+       <para>
+       For example, with <command>read_iops_limit = 1000</command> and
+       <command>read_burst_mult = 15</command>, the burst capacity is
+       1000 * 1.5 = 1500 IOPS.
+       </para>
+       <para>
+       This allows short-term I/O bursts above the steady-state rate while
+       still enforcing the configured long-term limit.
+       </para>
+
+       <para>
+       The appropriate burst multiplier depends on workload characteristics.
+       Workloads with larger or more variable asynchronous I/O requests may
+       require a higher burst value to avoid premature throttling, while
+       smaller or latency-sensitive workloads may benefit from lower values.
+       </para>
+
+       <note>
+       <para>
+       The <command>read_burst_mult</command> and 
<command>write_burst_mult</command>
+       parameters do not change the long-term average throughput, which 
remains limited
+       by <command>read_iops_limit</command>/<command>read_bw_limit</command> 
and
+       <command>write_iops_limit</command>/<command>write_bw_limit</command> 
respectively.
+       Higher burst values only affect initial acceleration and recovery from 
idle periods.
+       </para>
+       </note>
+</refsect1>
+
 <refsect1>
        <title>VERSION</title>
 
diff --git a/lib/replace/replace.h b/lib/replace/replace.h
index 49757e0f60d..051583cecc7 100644
--- a/lib/replace/replace.h
+++ b/lib/replace/replace.h
@@ -468,6 +468,14 @@ int rep_dlclose(void *handle);
 #endif
 #endif
 
+#ifndef PACKED_STRUCT
+#if __has_attribute(packed) || (__GNUC__ >= 3)
+#define PACKED_STRUCT __attribute__((packed))
+#else
+#define PACKED_STRUCT
+#endif
+#endif
+
 #if !defined(HAVE_VDPRINTF) || !defined(HAVE_C99_VSNPRINTF)
 #define vdprintf rep_vdprintf
 int rep_vdprintf(int fd, const char *format, va_list ap) PRINTF_ATTRIBUTE(2,0);
diff --git a/selftest/target/Samba3.pm b/selftest/target/Samba3.pm
index 9a059b86f38..b4e7f1a017d 100755
--- a/selftest/target/Samba3.pm
+++ b/selftest/target/Samba3.pm
@@ -3761,11 +3761,11 @@ sub provision($$)
        path = $shrdir
        vfs objects = aio_ratelimit
        aio_ratelimit: read_iops_limit = 10
-       aio_ratelimit: read_bw_limit = 100000
-       aio_ratelimit: read_delay_max = 10
+       aio_ratelimit: read_bw_limit = 100K
+       aio_ratelimit: read_burst_mult = 15
        aio_ratelimit: write_iops_limit = 100
-       aio_ratelimit: write_bw_limit = 100000
-       aio_ratelimit: write_delay_max = 10
+       aio_ratelimit: write_bw_limit = 100K
+       aio_ratelimit: write_burst_mult = 15
 
 include = $aliceconfdir/%U.conf
        ";
diff --git a/source3/modules/vfs_aio_ratelimit.c 
b/source3/modules/vfs_aio_ratelimit.c
index 6ebc0114c02..3ac1aec34e1 100644
--- a/source3/modules/vfs_aio_ratelimit.c
+++ b/source3/modules/vfs_aio_ratelimit.c
@@ -25,59 +25,96 @@
   I/O path, a delay is injected before sending back a reply to the caller,
   thus causing a rate-limit ceiling.
 
+  A configurable burst allowance is supported via a burst multiplier,
+  allowing short-term bursts above the steady-state rate while still
+  enforcing a long-term ceiling.
+
+  Rate-limiter state (token counters and timestamps) is periodically
+  persisted to a local TDB, allowing limits to be enforced consistently
+  across client reconnects and smbd restarts.
+
   An example to smb.conf segment (zero value implies ignore-this-option):
 
   [share]
   vfs objects = aio_ratelimit ...
   aio_ratelimit: read_iops_limit = 2000
-  aio_ratelimit: read_bw_limit = 2000000
+  aio_ratelimit: read_bw_limit = 2M
+  aio_ratelimit: read_burst_mult = 15      # == 1.5x burst
   aio_ratelimit: write_iops_limit = 0
-  aio_ratelimit: write_bw_limit = 1000000
+  aio_ratelimit: write_bw_limit = 1M
+  aio_ratelimit: write_burst_mult = 15     # == 1.5x burst
   ...
 
   Upon successful completion of async I/O request, tokens are produced based on
   the time which elapsed from previous requests, and tokens are consumed based
-  on actual I/O size. When current tokens value is negative, a delay is
-  calculated end injected to in-flight request. The delay value (microseconds)
+  on actual I/O size. When current token value is negative, a delay is
+  calculated and injected to in-flight request. The delay value (microseconds)
   is calculated based on the current tokens deficit.
  */
 
 #include "includes.h"
 #include "lib/util/time.h"
 #include "lib/util/tevent_unix.h"
+#include "lib/util/util_tdb.h"
+#include "tdb.h"
+#include "system/filesys.h"
 
 #undef DBGC_CLASS
 #define DBGC_CLASS DBGC_VFS
 
-/* Default and maximal delay values, in seconds */
-#define DELAY_SEC_DEF (30L)
-#define DELAY_SEC_MAX (300L)
+#define DELAY_SEC_MAX (100L)
 
-/* Maximal value for iops_limit */
+/* Default burst multiplier (1.5x) */
+#define BURST_MULT_DEF (15)
+
+/* Maximum value for iops_limit */
 #define IOPS_LIMIT_MAX (1000000L)
 
-/* Maximal value for bw_limit */
+/* Maximum value for bw_limit */
 #define BYTES_LIMIT_MAX (1L << 40)
 
-/* Module type-name in smb.conf & debug logging */
+/* Module name in smb.conf & debug logging */
 #define MODULE_NAME "aio_ratelimit"
 
-/* Token-based rate-limiter control state */
+/* How often to save token state to the local TDB, in microseconds */
+#define SAVE_INTERVAL_USEC (30 * 1000000L) /* 30 seconds */
+
+/* TDB schema version */
+#define RATELIMIT_TDB_VERSION 1
+
+static unsigned int ref_count = 0;
+static TDB_CONTEXT *ratelimit_tdb;
+
+/* TDB persistence structure */
+struct ratelimit_tdb_record {
+       uint64_t last_usec;
+       float iops_tokens;
+       float bytes_tokens;
+
+       /* Reserved for future extensions, keeps struct size stable */
+       uint8_t reserved[64 - (8 + 4 + 4)];
+} PACKED_STRUCT;
+
+/* Token-based rate-limiter control state using a token-bucket. */
 struct ratelimiter {
-       const char *oper;
-       struct timespec ts_base;
-       struct timespec ts_last;
-       int64_t iops_limit;
-       int64_t iops_total;
+       const char *op;
+       uint64_t last_usec;
+       uint64_t last_save_usec;
        float iops_tokens;
-       float iops_tokens_max;
-       float iops_tokens_min;
-       int64_t bw_limit;
-       int64_t bytes_total;
        float bytes_tokens;
-       float bytes_tokens_max;
-       float bytes_tokens_min;
-       int64_t delay_sec_max;
+       int64_t iops_total;
+       int64_t bytes_total;
+       int64_t iops_limit;
+       int64_t bw_limit;
+       float iops_capacity;
+       float bytes_capacity;
+
+       /*
+        * burst_mult is kept as a configuration policy.
+        * It allows capacity to be recalculated if limits
+        * are reconfigured in the future (e.g. reload, per-client limits).
+        */
+       float burst_mult;
        int snum;
 };
 
@@ -87,249 +124,403 @@ struct vfs_aio_ratelimit_config {
        struct ratelimiter wr_ratelimiter;
 };
 
-static float maxf(float x, float y)
+static uint64_t time_now_usec(void)
 {
-       return MAX(x, y);
+       struct timespec ts;
+
+       clock_gettime_mono(&ts);
+       return (uint64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
 }
 
-static float minf(float x, float y)
+static bool ratelimit_tdb_check_version(void)
 {
-       return MIN(x, y);
+       TDB_DATA key = {};
+       TDB_DATA val = {};
+       uint32_t version = 0;
+       int ret;
+
+       if (ratelimit_tdb == NULL) {
+               return false;
+       }
+
+       /* Check for existing version */
+       key = string_tdb_data("VERSION");
+       val = tdb_fetch(ratelimit_tdb, key);
+
+       if (val.dptr == NULL) {
+               /* No version key - this is a new TDB, write our version */
+               version = RATELIMIT_TDB_VERSION;
+               val = make_tdb_data((uint8_t *)&version, sizeof(version));
+               ret = tdb_store(ratelimit_tdb, key, val, TDB_INSERT);
+               if (ret != 0) {
+                       DBG_ERR("[%s] Failed to store TDB version\n",
+                               MODULE_NAME);
+                       return false;
+               }
+               DBG_DEBUG("[%s] Initialized TDB version %u\n",
+                         MODULE_NAME,
+                         version);
+               return true;
+       }
+
+       if (val.dsize != sizeof(uint32_t)) {
+               DBG_ERR("[%s] TDB version key has invalid size\n",
+                       MODULE_NAME);
+               SAFE_FREE(val.dptr);
+               return false;
+       }
+
+       memcpy(&version, val.dptr, sizeof(version));
+       SAFE_FREE(val.dptr);
+
+       if (version != RATELIMIT_TDB_VERSION) {
+               DBG_ERR("[%s] TDB version mismatch: found %u, expected %u\n",
+                       MODULE_NAME,
+                       version,
+                       RATELIMIT_TDB_VERSION);
+               return false;
+       }
+
+       DBG_DEBUG("[%s] TDB version %u verified\n", MODULE_NAME, version);
+       return true;
 }
 
-static struct timespec time_now(void)
+static bool ratelimit_tdb_init(void)
 {
-       struct timespec ts;
+       char *dbpath = NULL;
 
-       clock_gettime_mono(&ts);
-       return ts;
+       if (ratelimit_tdb != NULL) {
+               ref_count++;
+               DBG_DEBUG("[%s] TDB already open: ref_count now %u\n",
+                         MODULE_NAME,
+                         ref_count);
+               return true;
+       }
+
+       dbpath = state_path(talloc_tos(), "aio_ratelimit.tdb");
+       if (dbpath == NULL) {
+               DBG_ERR("[%s] Failed to allocate TDB path\n", MODULE_NAME);
+               return false;
+       }
+
+       become_root();
+       ratelimit_tdb = tdb_open(
+               dbpath, 0, TDB_DEFAULT, O_RDWR | O_CREAT, 0600);
+       unbecome_root();
+
+       TALLOC_FREE(dbpath);
+
+       if (ratelimit_tdb == NULL) {
+               DBG_NOTICE("[%s] Failed to open TDB, "
+                          "rate limiting will work without persistence\n",
+                          MODULE_NAME);
+               return false;
+       }
+
+       if (!ratelimit_tdb_check_version()) {
+               DBG_ERR("[%s] TDB version check failed, closing TDB\n",
+                       MODULE_NAME);
+               tdb_close(ratelimit_tdb);
+               ratelimit_tdb = NULL;
+               return false;
+       }
+
+       ref_count++;
+       DBG_DEBUG("[%s] Opened TDB, ref_count now %u\n",
+                 MODULE_NAME,
+                 ref_count);
+       return true;
 }
 
-static int64_t time_diff(const struct timespec *now,
-                        const struct timespec *prev)
+static TDB_DATA ratelimit_make_tdb_key(TALLOC_CTX *mem_ctx,
+                                      const struct ratelimiter *rl,
+                                      const char *servicename)
 {
-       return nsec_time_diff(now, prev) / 1000; /* usec */
+       char *keystr = NULL;
+
+       keystr = talloc_asprintf(mem_ctx, "share/%s/%s", servicename, rl->op);
+
+       return string_tdb_data(keystr);
+}
+
+static void ratelimit_save_tdb(struct ratelimiter *rl)
+{
+       TDB_DATA key = {};
+       TDB_DATA val = {};
+       struct ratelimit_tdb_record record = {};
+       char *servicename = NULL;
+       const struct loadparm_substitution
+               *lp_sub = loadparm_s3_global_substitution();
+
+       servicename = lp_servicename(talloc_tos(), lp_sub, rl->snum);
+
+       if (ratelimit_tdb == NULL) {
+               return;
+       }
+
+       key = ratelimit_make_tdb_key(talloc_tos(), rl, servicename);
+       if (key.dptr == NULL) {
+               return;
+       }
+
+       record.iops_tokens = rl->iops_tokens;
+       record.bytes_tokens = rl->bytes_tokens;
+       record.last_usec = rl->last_usec;
+
+       val = make_tdb_data((uint8_t *)&record, sizeof(record));
+
+       if (tdb_store(ratelimit_tdb, key, val, TDB_REPLACE) != 0) {
+               DBG_ERR("[%s] Failed to store TDB record for %s service=%s\n",
+                       MODULE_NAME,
+                       rl->op,
+                       servicename);
+               TALLOC_FREE(key.dptr);
+               return;
+       }
+
+       DBG_DEBUG("[%s] saved TDB for %s service=%s "
+                 "tokens(i=%.2f,b=%.2f)\n",
+                 MODULE_NAME,
+                 rl->op,
+                 servicename,
+                 rl->iops_tokens,
+                 rl->bytes_tokens);
+
+       TALLOC_FREE(key.dptr);
+}
+
+static int ratelimit_parse_tdb(TDB_DATA key, TDB_DATA val, void *private_data)
+{
+       struct ratelimiter *rl = (struct ratelimiter *)private_data;
+       struct ratelimit_tdb_record record = {};
+
+       if (val.dsize != sizeof(record)) {
+               DBG_WARNING("[%s] TDB record size mismatch\n", MODULE_NAME);
+               return -1;
+       }
+
+       memcpy(&record, val.dptr, sizeof(record));
+       rl->iops_tokens = record.iops_tokens;
+       rl->bytes_tokens = record.bytes_tokens;
+       rl->last_usec = record.last_usec;
+
+       DBG_DEBUG("[%s] loaded TDB for %s tokens(i=%.2f,b=%.2f)\n",
+                 MODULE_NAME,
+                 rl->op,
+                 rl->iops_tokens,
+                 rl->bytes_tokens);
+
+       return 0;
+}
+
+static void ratelimit_load_tdb(struct ratelimiter *rl)


-- 
Samba Shared Repository

Reply via email to