syrcx created an issue (kamailio/kamailio#4433)
### Description
We observe intermittent Redis failures in `ndb_redis` when using Redis Cluster
with TLS and password (AWS ElastiCache). Errors often appear right after a
MOVED redirection and during reconnect:
- “Resource temporarily unavailable” while reconnecting to the redirected node
- Subsequent “MOVED … host:port”
- The command fails (e.g., GET)
>From reading the module code, there seem to be two distinct issues:
- In cluster mode with `allow_dynamic_nodes=1`, when handling MOVED/ASK,
`ndb_redis` adds a new server entry including only `name/addr/port`. It does
not inherit `pass`, `tls`, `db`, or timeouts, so redirected connections are
attempted without TLS and without AUTH.
- The `tls` parameter parsing overwrites the `pass` buffer, corrupting a
previously parsed password when `tls=1` is present.
This explains the intermittent nature: only keys whose slots are redirected to
“new” nodes (created dynamically without TLS/auth) fail.
### Environment
- Kamailio version: ghcr.io/sergey-safarov/kamailio-ci:master-alpine, which is
from https://github.com/kamailio/kamailio/issues/4401#issuecomment-3352145843
- Redis: AWS ElastiCache Redis Cluster with TLS and AUTH
### Configuration (excerpt)
```cfg
loadmodule "ndb_redis.so"
modparam("ndb_redis", "init_without_redis", 1)
modparam("ndb_redis", "connect_timeout", 15000)
modparam("ndb_redis", "cmd_timeout", 5000)
modparam("ndb_redis", "cluster", 1)
modparam("ndb_redis", "allow_dynamic_nodes", 1)
modparam("ndb_redis", "ca_path", "/etc/ssl/certs")
modparam("ndb_redis", "server", "name=con1;addr=...;port=6379;pass=...;tls=1")
```
### Steps to Reproduce
1) Configure `ndb_redis` for Redis Cluster with TLS and password (as above),
set `cluster=1` and `allow_dynamic_nodes=1`.
2) Execute commands across keys that map to different hash slots (to trigger
MOVED).
3) Observe that some redirected connections report “Resource temporarily
unavailable” and the command fails.
### Actual Behavior
Logs (representative):
```
ERROR: ndb_redis [redis_client.c:677]: redisc_reconnect_server(): error
communicating with redis server
[shared-us-east-1-prod-0001-001.shared-us-east-1-prod.qonwyy.use1.cache.amazonaws.com:6379]
(shared-us-east-1-prod-0001-001.shared-us-east-1-prod.qonwyy.use1.cache.amazonaws.com:6379/0):
Resource temporarily unavailable
ERROR: ndb_redis [redis_client.c:686]: redisc_reconnect_server(): failed to
connect to redis server
[shared-us-east-1-prod-0001-001.shared-us-east-1-prod.qonwyy.use1.cache.amazonaws.com:6379]
(shared-us-east-1-prod-0001-001.shared-us-east-1-prod.qonwyy.use1.cache.amazonaws.com:6379/0)
ERROR: ndb_redis [redis_client.c:998]: check_cluster_reply(): failed connecting
to the new server with name:
shared-us-east-1-prod-0001-001.shared-us-east-1-prod.qonwyy.use1.cache.amazonaws.com:6379
ERROR: ndb_redis [redis_client.c:1143]: redisc_exec(): Redis error:MOVED 10021
shared-us-east-1-prod-0001-001.shared-us-east-1-prod.qonwyy.use1.cache.amazonaws.com:6379
ERROR: <script>: Redis GET failed for key bye_headers:...
```
### Expected Behavior
MOVED redirections should connect using the same TLS, AUTH, DB, and timeout
settings as the original server; the command should then be retried
successfully.
### Code references
- MOVED handling creates new node specs without TLS/AUTH/DB/timeouts, then
attempts to connect:
```911:1021:kamailio/src/modules/ndb_redis/redis_client.c
int check_cluster_reply(redisReply *reply, redisc_server_t **rsrv)
{
...
if((reply->len > 7) && (strncmp(reply->str, "MOVED", 5) == 0)) {
...
rsrv_new = redisc_get_server(&name);
if(rsrv_new) {
*rsrv = rsrv_new;
return 1;
} else if(redis_allow_dynamic_nodes_param) {
...
server_len = snprintf(spec_new,
sizeof(spec_new) - 1,
"name=%.*s;addr=%.*s;port=%i",
name.len, name.s,
addr.len, addr.s, port);
...
if(redisc_add_server(server_new) == 0) {
rsrv_new = redisc_get_server(&name);
if(rsrv_new) {
*rsrv = rsrv_new;
/* Need to connect to the new
server now */
if(redisc_reconnect_server(rsrv_new) == 0) {
...
return 1;
} else {
LM_ERR("failed
connecting to the new server with "
"name:
%.*s\n", name.len, name.s);
return 0;
}
```
- TLS parameter overwrites `pass` buffer (init):
```154:160:kamailio/src/modules/ndb_redis/redis_client.c
} else if(pit->name.len == 3
&& strncmp(pit->name.s, "tls", 3) == 0) {
snprintf(pass, sizeof(pass) - 1, "%.*s", pit->body.len,
pit->body.s);
if(str2int(&pit->body, &enable_ssl) < 0)
enable_ssl = 0;
```
- Same bug in reconnect:
```526:531:kamailio/src/modules/ndb_redis/redis_client.c
} else if(pit->name.len == 3 && strncmp(pit->name.s, "tls", 3) == 0) {
snprintf(pass, sizeof(pass) - 1, "%.*s", pit->body.len, pit->body.s);
if(str2int(&pit->body, &enable_ssl) < 0)
enable_ssl = 0;
```
- Reconnect error reporting path (where “Resource temporarily unavailable” is
logged):
```670:681:kamailio/src/modules/ndb_redis/redis_client.c
err2:
if(sock != 0) {
LM_ERR("error communicating with redis server [%.*s]"
" (unix:%s db:%d): %s\n",
rsrv->sname->len, rsrv->sname->s,
unix_sock_path, db,
rsrv->ctxRedis->errstr);
} else {
LM_ERR("error communicating with redis server [%.*s]
(%s:%d/%d): %s\n",
rsrv->sname->len, rsrv->sname->s, addr, port,
db,
rsrv->ctxRedis->errstr);
}
```
- Minor robustness issue: sentinel master address copy uses `len + 1`:
```579:583:kamailio/src/modules/ndb_redis/redis_client.c
strncpy(addr, res->element[0]->str, res->element[0]->len + 1);
```
### Proposed fixes
- When adding dynamic nodes after MOVED/ASK, include inherited settings in
`spec_new` (at least `pass`, `db`, `tls`, `connect_timeout`, `command_timeout`)
or duplicate/copy the parent server’s `attrs`.
- In both init and reconnect, do not write to `pass` in the `tls` parsing
branch; only parse `enable_ssl`.
- Call `redisSetTimeout()` immediately after creating the hiredis context and
before `AUTH`/`PING`/`SELECT`.
- Bound the sentinel master address copy with a proper length and explicit NUL.
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/4433
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/[email protected]>_______________________________________________
Kamailio - Development Mailing List -- [email protected]
To unsubscribe send an email to [email protected]
Important: keep the mailing list in the recipients, do not reply only to the
sender!