Source: gdbm
Version: 1.18.1-4
Severity: normal

Dear Maintainer,

gdbm_reorganize() leaks open file descriptors. It can be re-produces by
using the python3-gdbm binding:

  python3 -c 'import os,subprocess;import dbm.gnu;DB = "./db.gdbm";db = 
dbm.gnu.open(DB, "csu");db.reorganize();db.close();subprocess.call(f"lsof -p 
{os.getpid()}|grep {DB}", shell=True)'

> python 2119 phahn  DEL-W  REG    254,0             6293029 ./db.gdbm
> python 2119 phahn  DEL-W  REG    254,0             6293042 ./db.gdbm

For each call to reorganize() the DB is copied to a new file which is
then again memory-mapped. This takes one FD each time.

# dpkg -s libgdbm6
Package: libgdbm6
Source: gdbm
Version: 1.18.1-4

# dpkg -s python3-gdbm
Package: python3-gdbm
Source: python3-stdlib-extensions
Version: 3.7.3-1

Debian-9-Stretch     1.8.3-14
Debian-10-Buster     1.18.1-4 (affected)
Debian-11-Bullseye   1.19-2 (affected)
Debian-12-Bookworm   1.23-3 (fixed)


I have bisected the issue:
git bisect start '--term-new' 'fixed' '--term-old' 'broken'
# fixed: [331f05ac9c58d358806fe1bcba88a01467ab0895] Bugfix
git bisect fixed 331f05ac9c58d358806fe1bcba88a01467ab0895
# broken: [778cc81d55aecd6344d577919cec73e4e6980e2e] Version 1.18
git bisect broken 778cc81d55aecd6344d577919cec73e4e6980e2e
# broken: [611cac791f192834d49cd1bd8cfab76a190bfc40] Document new gdbmtool 
options
git bisect broken 611cac791f192834d49cd1bd8cfab76a190bfc40
# fixed: [3052f2b51eaa9ba047d43ae97581ae1fd895c131] Add missing include
git bisect fixed 3052f2b51eaa9ba047d43ae97581ae1fd895c131
# fixed: [e4089536f849644b12d9647aff12f2dac312b940] Version 1.21 released
git bisect fixed e4089536f849644b12d9647aff12f2dac312b940
# broken: [038872b1582c6b99cc352551875aba0e81f6b5f0] Minor fix
git bisect broken 038872b1582c6b99cc352551875aba0e81f6b5f0
# fixed: [a62815cdf0b60725d6445968ce3d5c39e912eb9c] Update docs
git bisect fixed a62815cdf0b60725d6445968ce3d5c39e912eb9c
# fixed: [cc7051ae2ea384863937083a3a60a5a008d511a5] gdbmshell: get rid of 
remaining globals
git bisect fixed cc7051ae2ea384863937083a3a60a5a008d511a5
# fixed: [d19407eaa4b00a724c4ff3744c2f49269183da26] gdbmtool: bugfixes
git bisect fixed d19407eaa4b00a724c4ff3744c2f49269183da26
# fixed: [f77b0273db2d1f0cc1ba2d3256acfab1bda1f584] Fix duplicated mmap in 
gdbm_recover
git bisect fixed f77b0273db2d1f0cc1ba2d3256acfab1bda1f584
# first fixed commit: [f77b0273db2d1f0cc1ba2d3256acfab1bda1f584] Fix duplicated 
mmap in gdbm_recover

Sadly that patch is not easy to backport because GDBM changes some
fundamental details, e.g. "New bucket cache" in v1.21 and "Bucket cache
switched from balanced tree to hash table" in v1.23

A similar bug was reported for Python
<https://bugs.python.org/issue13947>, which links to
<https://git.gnu.org.ua/gdbm.git/commit/?id=a0d6c1a8> which is ALSO
required but not enough.

I have back-ported that patch and so far it works fine, but a second set
of eyes or even a review from upstream would be good.


-- System Information:
Debian Release: 11.5
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 
'stable'), (50, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.0-18-amd64 (SMP w/4 CPU threads)
Kernel taint flags: TAINT_WARN
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), 
LANGUAGE=de:en_US
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
>From ffeafe9692da956703d28f1f16c81a027a0e6869 Mon Sep 17 00:00:00 2001
Message-Id: 
<ffeafe9692da956703d28f1f16c81a027a0e6869.1666685456.git.h...@univention.de>
From: Sergey Poznyakoff <g...@gnu.org>
Date: Mon, 8 Apr 2019 22:17:10 +0300
Subject: [PATCH 1/2] Preserve locking type during database reorganization

* src/recover.c (_gdbm_finish_transfer): Preserve locking type.
---
 src/recover.c | 1 +
 1 file changed, 1 insertion(+)

diff --git src/recover.c src/recover.c
index 7cc20a2..f1603d9 100644
--- src/recover.c
+++ src/recover.c
@@ -155,6 +155,7 @@ _gdbm_finish_transfer (GDBM_FILE dbf, GDBM_FILE new_dbf,
       free (dbf->bucket_cache);
    }
 
+   dbf->lock_type         = new_dbf->lock_type;
    dbf->desc              = new_dbf->desc;
    dbf->header            = new_dbf->header;
    dbf->dir               = new_dbf->dir;
-- 
2.30.2

>From 40c6da6ca9972ff0b9d7d084fe118c765808fef6 Mon Sep 17 00:00:00 2001
Message-Id: 
<40c6da6ca9972ff0b9d7d084fe118c765808fef6.1666685456.git.h...@univention.de>
In-Reply-To: 
<ffeafe9692da956703d28f1f16c81a027a0e6869.1666685456.git.h...@univention.de>
References: 
<ffeafe9692da956703d28f1f16c81a027a0e6869.1666685456.git.h...@univention.de>
From: Sergey Poznyakoff <g...@gnu.org>
Date: Wed, 11 Aug 2021 23:01:37 +0300
Subject: [PATCH 2/2] Fix duplicated mmap in gdbm_recover

* src/recover.c (_gdbm_finish_transfer): Reuse memory mapping
from the intermediate dbm structure.
---
 src/recover.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git src/recover.c src/recover.c
index f1603d9..893e0a2 100644
--- src/recover.c
+++ src/recover.c
@@ -169,14 +169,14 @@ _gdbm_finish_transfer (GDBM_FILE dbf, GDBM_FILE new_dbf,
    dbf->bucket_changed    = new_dbf->bucket_changed;
    dbf->second_changed    = new_dbf->second_changed;
 
+  dbf->mapped_size_max   = new_dbf->mapped_size_max;    
+  dbf->mapped_region    = new_dbf->mapped_region;      
+  dbf->mapped_size      = new_dbf->mapped_size;        
+  dbf->mapped_pos       = new_dbf->mapped_pos;         
+  dbf->mapped_off       = new_dbf->mapped_off;         
+
    free (new_dbf->name);
    free (new_dbf);
-   
- #if HAVE_MMAP
-   /* Re-initialize mapping if required */
-   if (dbf->memory_mapping)
-     _gdbm_mapped_init (dbf);
- #endif
 
    /* Make sure the new database is all on disk. */
    gdbm_file_sync (dbf);
-- 
2.30.2

Reply via email to