The branch, master has been updated
via 83f4b51175c s3:selftest: update aio_ratelimit tests for burst
support
via 9b54d8beaef docs-xml/manpages: update doc to add burst_mult
parameters
via d6332b2caf0 vfs_aio_ratelimit: support human-readable bandwidth
limits
via 306612e09c0 vfs_aio_ratelimit: Add per-share TDB persistence for
local rate limiter state
via f6a67c361bc vfs_aio_ratelimit: introduce burst-aware token bucket
model
from 31f3bc19d5a quic_ko_wrapper: Fix a typo
https://git.samba.org/?p=samba.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit 83f4b51175cdaa20039de7e823bc4c6a15893628
Author: Avan Thakkar <[email protected]>
Date: Thu Jan 22 21:48:37 2026 +0530
s3:selftest: update aio_ratelimit tests for burst support
- Replace delay_max configuration with burst_mult parameters.
- Add three test cases: basic rate limiting, burst behavior, and recovery
BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
Signed-off-by: Avan Thakkar <[email protected]>
Reviewed-by: Shweta Sodani <[email protected]>
Reviewed-by: Shachar Sharon <[email protected]>
Reviewed-by: Guenther Deschner <[email protected]>
Reviewed-by: Anoop C S <[email protected]>
Autobuild-User(master): Günther Deschner <[email protected]>
Autobuild-Date(master): Fri Feb 27 11:52:46 UTC 2026 on atb-devel-224
commit 9b54d8beaefd9b835b971dd0370d3a1f198121d8
Author: Avan Thakkar <[email protected]>
Date: Thu Jan 22 21:13:57 2026 +0530
docs-xml/manpages: update doc to add burst_mult parameters
BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
Signed-off-by: Avan Thakkar <[email protected]>
Reviewed-by: Shweta Sodani <[email protected]>
Reviewed-by: Shachar Sharon <[email protected]>
Reviewed-by: Guenther Deschner <[email protected]>
Reviewed-by: Anoop C S <[email protected]>
commit d6332b2caf03e4fcbed0a67208251a27601c527d
Author: Avan Thakkar <[email protected]>
Date: Thu Jan 22 20:28:16 2026 +0530
vfs_aio_ratelimit: support human-readable bandwidth limits
Allow read_bw_limit and write_bw_limit to be specified using
size suffixes (K/M/G/T).
BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
Signed-off-by: Avan Thakkar <[email protected]>
Reviewed-by: Shweta Sodani <[email protected]>
Reviewed-by: Shachar Sharon <[email protected]>
Reviewed-by: Guenther Deschner <[email protected]>
Reviewed-by: Anoop C S <[email protected]>
commit 306612e09c082282f39789c426ea85cc7e2bb6e3
Author: Avan Thakkar <[email protected]>
Date: Tue Dec 2 14:20:42 2025 +0530
vfs_aio_ratelimit: Add per-share TDB persistence for local rate limiter
state
Introduce local TDB storage for saving and restoring ratelimiter state
(iops_tokens, bytes_tokens, last timestamp). Each share now persists
its read/write limiter state under aio_ratelimit.tdb.
Added VERSION pseudo-key for schema versioning
On disconnect, save the latest state and close TDB.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
Signed-off-by: Avan Thakkar <[email protected]>
Reviewed-by: Shweta Sodani <[email protected]>
Reviewed-by: Shachar Sharon <[email protected]>
Reviewed-by: Guenther Deschner <[email protected]>
Reviewed-by: Anoop C S <[email protected]>
commit f6a67c361bcb0d9f4a7f451dcfda800775b5be13
Author: Avan Thakkar <[email protected]>
Date: Mon Dec 1 18:04:54 2025 +0530
vfs_aio_ratelimit: introduce burst-aware token bucket model
Refactor the rate limiter to use a continuous token-bucket model with
configurable burst multiplier. This replaces the older time-window and
delay_max logic.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=16000
Signed-off-by: Avan Thakkar <[email protected]>
Reviewed-by: Shweta Sodani <[email protected]>
Reviewed-by: Shachar Sharon <[email protected]>
Reviewed-by: Guenther Deschner <[email protected]>
Reviewed-by: Anoop C S <[email protected]>
-----------------------------------------------------------------------
Summary of changes:
docs-xml/manpages/vfs_aio_ratelimit.8.xml | 76 +++-
lib/replace/replace.h | 8 +
selftest/target/Samba3.pm | 8 +-
source3/modules/vfs_aio_ratelimit.c | 706 +++++++++++++++++++----------
source3/script/tests/test_aio_ratelimit.sh | 139 +++++-
5 files changed, 681 insertions(+), 256 deletions(-)
Changeset truncated at 500 lines:
diff --git a/docs-xml/manpages/vfs_aio_ratelimit.8.xml
b/docs-xml/manpages/vfs_aio_ratelimit.8.xml
index 43d3e695c08..94f470cc857 100644
--- a/docs-xml/manpages/vfs_aio_ratelimit.8.xml
+++ b/docs-xml/manpages/vfs_aio_ratelimit.8.xml
@@ -32,11 +32,19 @@
rate-limiting on specific shares by enforcing upper limit on async I/O
operations. An administrator may define this limit as operations
per-second or bytes-per-second. When one of those limits is exceeded,
- a delay value (in milliseconds) is calculated based on current I/O load
+ a delay value (in microseconds) is calculated based on current I/O load
and injected to async I/O operations, yielding an implicit throughput
ceiling.
</para>
+ <para>
+ A configurable burst allowance is supported via a burst multiplier,
+ allowing short-term bursts above the steady-state rate while still
+ enforcing a long-term ceiling. Rate-limiter state is periodically
+ persisted to a local TDB, allowing limits to be enforced consistently
+ across client reconnects and smbd restarts.
+ </para>
+
<para>
This module operates only on asynchronous VFS READ/WRITE operation.
</para>
@@ -79,24 +87,27 @@
<para>
Upper limit of READ bandwidth (bytes-per-second) before
injecting delays. Zero value implies no limit.
+ Supports size suffixes (K, M, G, T).
</para>
<para>Default: 0, Max: 1T</para>
- <para>Example: aio_ratelimit:read_bw_limit = 1000000</para>
+ <para>Example: aio_ratelimit:read_bw_limit = 2M</para>
</listitem>
</varlistentry>
<varlistentry>
- <term>aio_ratelimit:read_delay_max = seconds</term>
+ <term>aio_ratelimit:read_burst_mult = value</term>
<listitem>
<para>
- Maximal allowed delay value, in seconds, for READ.
+ Burst multiplier for READ operations, expressed in
+ tenths (e.g., 15 = 1.5x). Defines the token bucket
+ capacity as a multiple of the rate limit, allowing
+ short-term bursts above the steady-state rate.
</para>
- <para>Default: 30, Max: 300</para>
- <para>Example: aio_ratelimit:read_delay_max = 15</para>
+ <para>Default: 15 (1.5x), Max: 100 (10x)</para>
+ <para>Example: aio_ratelimit:read_burst_mult = 20</para>
</listitem>
</varlistentry>
-
<varlistentry>
<term>aio_ratelimit:write_iops_limit = count</term>
<listitem>
@@ -115,26 +126,67 @@
<para>
Upper limit of WRITE bandwidth (bytes-per-second)
before injecting delays. Zero value implies no limit.
+ Supports size suffixes (K, M, G, T).
</para>
<para>Default: 0, Max: 1T</para>
- <para>Example: aio_ratelimit:write_bw_limit = 1000000</para>
+ <para>Example: aio_ratelimit:write_bw_limit = 1M</para>
</listitem>
</varlistentry>
<varlistentry>
- <term>aio_ratelimit:write_delay_max = seconds</term>
+ <term>aio_ratelimit:write_burst_mult = value</term>
<listitem>
<para>
- Maximal allowed delay value, in seconds, for WRITE.
+ Burst multiplier for WRITE operations, expressed in
+ tenths (e.g., 15 = 1.5x). Defines the token bucket
+ capacity as a multiple of the rate limit, allowing
+ short-term bursts above the steady-state rate.
</para>
- <para>Default: 30, Max: 300</para>
- <para>Example: aio_ratelimit:write_delay_max = 20</para>
+ <para>Default: 15 (1.5x), Max: 100 (10x)</para>
+ <para>Example: aio_ratelimit:write_burst_mult = 15</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
+<refsect1>
+ <title>BURST BEHAVIOR</title>
+
+ <para>
+ The <command>read_burst_mult</command> and
<command>write_burst_mult</command>
+ parameters control the maximum burst capacity of the rate limiter
relative to
+ the configured rate limits. The effective burst capacity is calculated
as:
+ <emphasis>rate_limit * (burst_mult / 10)</emphasis>.
+ </para>
+ <para>
+ For example, with <command>read_iops_limit = 1000</command> and
+ <command>read_burst_mult = 15</command>, the burst capacity is
+ 1000 * 1.5 = 1500 IOPS.
+ </para>
+ <para>
+ This allows short-term I/O bursts above the steady-state rate while
+ still enforcing the configured long-term limit.
+ </para>
+
+ <para>
+ The appropriate burst multiplier depends on workload characteristics.
+ Workloads with larger or more variable asynchronous I/O requests may
+ require a higher burst value to avoid premature throttling, while
+ smaller or latency-sensitive workloads may benefit from lower values.
+ </para>
+
+ <note>
+ <para>
+ The <command>read_burst_mult</command> and
<command>write_burst_mult</command>
+ parameters do not change the long-term average throughput, which
remains limited
+ by <command>read_iops_limit</command>/<command>read_bw_limit</command>
and
+ <command>write_iops_limit</command>/<command>write_bw_limit</command>
respectively.
+ Higher burst values only affect initial acceleration and recovery from
idle periods.
+ </para>
+ </note>
+</refsect1>
+
<refsect1>
<title>VERSION</title>
diff --git a/lib/replace/replace.h b/lib/replace/replace.h
index 49757e0f60d..051583cecc7 100644
--- a/lib/replace/replace.h
+++ b/lib/replace/replace.h
@@ -468,6 +468,14 @@ int rep_dlclose(void *handle);
#endif
#endif
+#ifndef PACKED_STRUCT
+#if __has_attribute(packed) || (__GNUC__ >= 3)
+#define PACKED_STRUCT __attribute__((packed))
+#else
+#define PACKED_STRUCT
+#endif
+#endif
+
#if !defined(HAVE_VDPRINTF) || !defined(HAVE_C99_VSNPRINTF)
#define vdprintf rep_vdprintf
int rep_vdprintf(int fd, const char *format, va_list ap) PRINTF_ATTRIBUTE(2,0);
diff --git a/selftest/target/Samba3.pm b/selftest/target/Samba3.pm
index 9a059b86f38..b4e7f1a017d 100755
--- a/selftest/target/Samba3.pm
+++ b/selftest/target/Samba3.pm
@@ -3761,11 +3761,11 @@ sub provision($$)
path = $shrdir
vfs objects = aio_ratelimit
aio_ratelimit: read_iops_limit = 10
- aio_ratelimit: read_bw_limit = 100000
- aio_ratelimit: read_delay_max = 10
+ aio_ratelimit: read_bw_limit = 100K
+ aio_ratelimit: read_burst_mult = 15
aio_ratelimit: write_iops_limit = 100
- aio_ratelimit: write_bw_limit = 100000
- aio_ratelimit: write_delay_max = 10
+ aio_ratelimit: write_bw_limit = 100K
+ aio_ratelimit: write_burst_mult = 15
include = $aliceconfdir/%U.conf
";
diff --git a/source3/modules/vfs_aio_ratelimit.c
b/source3/modules/vfs_aio_ratelimit.c
index 6ebc0114c02..3ac1aec34e1 100644
--- a/source3/modules/vfs_aio_ratelimit.c
+++ b/source3/modules/vfs_aio_ratelimit.c
@@ -25,59 +25,96 @@
I/O path, a delay is injected before sending back a reply to the caller,
thus causing a rate-limit ceiling.
+ A configurable burst allowance is supported via a burst multiplier,
+ allowing short-term bursts above the steady-state rate while still
+ enforcing a long-term ceiling.
+
+ Rate-limiter state (token counters and timestamps) is periodically
+ persisted to a local TDB, allowing limits to be enforced consistently
+ across client reconnects and smbd restarts.
+
An example to smb.conf segment (zero value implies ignore-this-option):
[share]
vfs objects = aio_ratelimit ...
aio_ratelimit: read_iops_limit = 2000
- aio_ratelimit: read_bw_limit = 2000000
+ aio_ratelimit: read_bw_limit = 2M
+ aio_ratelimit: read_burst_mult = 15 # == 1.5x burst
aio_ratelimit: write_iops_limit = 0
- aio_ratelimit: write_bw_limit = 1000000
+ aio_ratelimit: write_bw_limit = 1M
+ aio_ratelimit: write_burst_mult = 15 # == 1.5x burst
...
Upon successful completion of async I/O request, tokens are produced based on
the time which elapsed from previous requests, and tokens are consumed based
- on actual I/O size. When current tokens value is negative, a delay is
- calculated end injected to in-flight request. The delay value (microseconds)
+ on actual I/O size. When current token value is negative, a delay is
+ calculated and injected to in-flight request. The delay value (microseconds)
is calculated based on the current tokens deficit.
*/
#include "includes.h"
#include "lib/util/time.h"
#include "lib/util/tevent_unix.h"
+#include "lib/util/util_tdb.h"
+#include "tdb.h"
+#include "system/filesys.h"
#undef DBGC_CLASS
#define DBGC_CLASS DBGC_VFS
-/* Default and maximal delay values, in seconds */
-#define DELAY_SEC_DEF (30L)
-#define DELAY_SEC_MAX (300L)
+#define DELAY_SEC_MAX (100L)
-/* Maximal value for iops_limit */
+/* Default burst multiplier (1.5x) */
+#define BURST_MULT_DEF (15)
+
+/* Maximum value for iops_limit */
#define IOPS_LIMIT_MAX (1000000L)
-/* Maximal value for bw_limit */
+/* Maximum value for bw_limit */
#define BYTES_LIMIT_MAX (1L << 40)
-/* Module type-name in smb.conf & debug logging */
+/* Module name in smb.conf & debug logging */
#define MODULE_NAME "aio_ratelimit"
-/* Token-based rate-limiter control state */
+/* How often to save token state to the local TDB, in microseconds */
+#define SAVE_INTERVAL_USEC (30 * 1000000L) /* 30 seconds */
+
+/* TDB schema version */
+#define RATELIMIT_TDB_VERSION 1
+
+static unsigned int ref_count = 0;
+static TDB_CONTEXT *ratelimit_tdb;
+
+/* TDB persistence structure */
+struct ratelimit_tdb_record {
+ uint64_t last_usec;
+ float iops_tokens;
+ float bytes_tokens;
+
+ /* Reserved for future extensions, keeps struct size stable */
+ uint8_t reserved[64 - (8 + 4 + 4)];
+} PACKED_STRUCT;
+
+/* Token-based rate-limiter control state using a token-bucket. */
struct ratelimiter {
- const char *oper;
- struct timespec ts_base;
- struct timespec ts_last;
- int64_t iops_limit;
- int64_t iops_total;
+ const char *op;
+ uint64_t last_usec;
+ uint64_t last_save_usec;
float iops_tokens;
- float iops_tokens_max;
- float iops_tokens_min;
- int64_t bw_limit;
- int64_t bytes_total;
float bytes_tokens;
- float bytes_tokens_max;
- float bytes_tokens_min;
- int64_t delay_sec_max;
+ int64_t iops_total;
+ int64_t bytes_total;
+ int64_t iops_limit;
+ int64_t bw_limit;
+ float iops_capacity;
+ float bytes_capacity;
+
+ /*
+ * burst_mult is kept as a configuration policy.
+ * It allows capacity to be recalculated if limits
+ * are reconfigured in the future (e.g. reload, per-client limits).
+ */
+ float burst_mult;
int snum;
};
@@ -87,249 +124,403 @@ struct vfs_aio_ratelimit_config {
struct ratelimiter wr_ratelimiter;
};
-static float maxf(float x, float y)
+static uint64_t time_now_usec(void)
{
- return MAX(x, y);
+ struct timespec ts;
+
+ clock_gettime_mono(&ts);
+ return (uint64_t)ts.tv_sec * 1000000 + ts.tv_nsec / 1000;
}
-static float minf(float x, float y)
+static bool ratelimit_tdb_check_version(void)
{
- return MIN(x, y);
+ TDB_DATA key = {};
+ TDB_DATA val = {};
+ uint32_t version = 0;
+ int ret;
+
+ if (ratelimit_tdb == NULL) {
+ return false;
+ }
+
+ /* Check for existing version */
+ key = string_tdb_data("VERSION");
+ val = tdb_fetch(ratelimit_tdb, key);
+
+ if (val.dptr == NULL) {
+ /* No version key - this is a new TDB, write our version */
+ version = RATELIMIT_TDB_VERSION;
+ val = make_tdb_data((uint8_t *)&version, sizeof(version));
+ ret = tdb_store(ratelimit_tdb, key, val, TDB_INSERT);
+ if (ret != 0) {
+ DBG_ERR("[%s] Failed to store TDB version\n",
+ MODULE_NAME);
+ return false;
+ }
+ DBG_DEBUG("[%s] Initialized TDB version %u\n",
+ MODULE_NAME,
+ version);
+ return true;
+ }
+
+ if (val.dsize != sizeof(uint32_t)) {
+ DBG_ERR("[%s] TDB version key has invalid size\n",
+ MODULE_NAME);
+ SAFE_FREE(val.dptr);
+ return false;
+ }
+
+ memcpy(&version, val.dptr, sizeof(version));
+ SAFE_FREE(val.dptr);
+
+ if (version != RATELIMIT_TDB_VERSION) {
+ DBG_ERR("[%s] TDB version mismatch: found %u, expected %u\n",
+ MODULE_NAME,
+ version,
+ RATELIMIT_TDB_VERSION);
+ return false;
+ }
+
+ DBG_DEBUG("[%s] TDB version %u verified\n", MODULE_NAME, version);
+ return true;
}
-static struct timespec time_now(void)
+static bool ratelimit_tdb_init(void)
{
- struct timespec ts;
+ char *dbpath = NULL;
- clock_gettime_mono(&ts);
- return ts;
+ if (ratelimit_tdb != NULL) {
+ ref_count++;
+ DBG_DEBUG("[%s] TDB already open: ref_count now %u\n",
+ MODULE_NAME,
+ ref_count);
+ return true;
+ }
+
+ dbpath = state_path(talloc_tos(), "aio_ratelimit.tdb");
+ if (dbpath == NULL) {
+ DBG_ERR("[%s] Failed to allocate TDB path\n", MODULE_NAME);
+ return false;
+ }
+
+ become_root();
+ ratelimit_tdb = tdb_open(
+ dbpath, 0, TDB_DEFAULT, O_RDWR | O_CREAT, 0600);
+ unbecome_root();
+
+ TALLOC_FREE(dbpath);
+
+ if (ratelimit_tdb == NULL) {
+ DBG_NOTICE("[%s] Failed to open TDB, "
+ "rate limiting will work without persistence\n",
+ MODULE_NAME);
+ return false;
+ }
+
+ if (!ratelimit_tdb_check_version()) {
+ DBG_ERR("[%s] TDB version check failed, closing TDB\n",
+ MODULE_NAME);
+ tdb_close(ratelimit_tdb);
+ ratelimit_tdb = NULL;
+ return false;
+ }
+
+ ref_count++;
+ DBG_DEBUG("[%s] Opened TDB, ref_count now %u\n",
+ MODULE_NAME,
+ ref_count);
+ return true;
}
-static int64_t time_diff(const struct timespec *now,
- const struct timespec *prev)
+static TDB_DATA ratelimit_make_tdb_key(TALLOC_CTX *mem_ctx,
+ const struct ratelimiter *rl,
+ const char *servicename)
{
- return nsec_time_diff(now, prev) / 1000; /* usec */
+ char *keystr = NULL;
+
+ keystr = talloc_asprintf(mem_ctx, "share/%s/%s", servicename, rl->op);
+
+ return string_tdb_data(keystr);
+}
+
+static void ratelimit_save_tdb(struct ratelimiter *rl)
+{
+ TDB_DATA key = {};
+ TDB_DATA val = {};
+ struct ratelimit_tdb_record record = {};
+ char *servicename = NULL;
+ const struct loadparm_substitution
+ *lp_sub = loadparm_s3_global_substitution();
+
+ servicename = lp_servicename(talloc_tos(), lp_sub, rl->snum);
+
+ if (ratelimit_tdb == NULL) {
+ return;
+ }
+
+ key = ratelimit_make_tdb_key(talloc_tos(), rl, servicename);
+ if (key.dptr == NULL) {
+ return;
+ }
+
+ record.iops_tokens = rl->iops_tokens;
+ record.bytes_tokens = rl->bytes_tokens;
+ record.last_usec = rl->last_usec;
+
+ val = make_tdb_data((uint8_t *)&record, sizeof(record));
+
+ if (tdb_store(ratelimit_tdb, key, val, TDB_REPLACE) != 0) {
+ DBG_ERR("[%s] Failed to store TDB record for %s service=%s\n",
+ MODULE_NAME,
+ rl->op,
+ servicename);
+ TALLOC_FREE(key.dptr);
+ return;
+ }
+
+ DBG_DEBUG("[%s] saved TDB for %s service=%s "
+ "tokens(i=%.2f,b=%.2f)\n",
+ MODULE_NAME,
+ rl->op,
+ servicename,
+ rl->iops_tokens,
+ rl->bytes_tokens);
+
+ TALLOC_FREE(key.dptr);
+}
+
+static int ratelimit_parse_tdb(TDB_DATA key, TDB_DATA val, void *private_data)
+{
+ struct ratelimiter *rl = (struct ratelimiter *)private_data;
+ struct ratelimit_tdb_record record = {};
+
+ if (val.dsize != sizeof(record)) {
+ DBG_WARNING("[%s] TDB record size mismatch\n", MODULE_NAME);
+ return -1;
+ }
+
+ memcpy(&record, val.dptr, sizeof(record));
+ rl->iops_tokens = record.iops_tokens;
+ rl->bytes_tokens = record.bytes_tokens;
+ rl->last_usec = record.last_usec;
+
+ DBG_DEBUG("[%s] loaded TDB for %s tokens(i=%.2f,b=%.2f)\n",
+ MODULE_NAME,
+ rl->op,
+ rl->iops_tokens,
+ rl->bytes_tokens);
+
+ return 0;
+}
+
+static void ratelimit_load_tdb(struct ratelimiter *rl)
--
Samba Shared Repository