Bug#852757: apt calls malloc inside SIGWINCH handler, leading to deadlock
Hello, I created a merge request on salsa.debian.org. The patch is same as my last email's. Here is the URL: https://salsa.debian.org/apt-team/apt/-/merge_requests/204 Best regards, Zhang Boyang
Bug#852757: apt calls malloc inside SIGWINCH handler, leading to deadlock
Hello, On 2021/11/25 21:03, David Kalnischkies wrote: Hi, On Thu, Nov 25, 2021 at 07:30:45PM +0800, Zhang Boyang wrote: It seems my last email was marked spam. It's strange that the bug tracker has received my reply but the mailing list didn't. Please correct me if I misunderstand the situation. Indeed, looks like your mail was dropped at some point. If you supply sufficient details listmasters might be able to tell you what exactly went wrong. (The BTS seems to assign your mails ~ -10 points which is very good, but at the same time the list mail which passed through got 0, which is not bad, but also not particular good citing e.g. "MIME_BASE64_TEXT_BOGUS(1) R_DKIM_REJECT(1)". With patches you might be better of in salsa btw.) I talked to the listmaster. The listmaster told me my mail is too large for deity mail list. I will try to reduce my mail size in future :) The patch is attached again for convenience. This patch implemented Anders's idea. The signal handler will only set a flag. After signal handler completed, the pselect() in pkgDPkgPM::Go() will return immediately with errno set to EINTR. Then, progress->Pluse() is called by pkgDPkgPM::Go(), and PackageManagerFancy::Pluse() will redraw the status bar. Not having looked at the issue itself I just gave the patch a casual glance: PackageManagerFancy is a public – hence exported – interface, so you can't change member variables or add virtual methods as that breaks the ABI. I revised my patch. Now the patch only add static member variables, which is not harmful to ABI. The Pluse() method already exists in base class but newly overridden. I think this change will not crash existing programs. However, existing compiled programs can still use BaseClass::Pluse() because compilers may optimize virtual calls. Personally I think this is not harmful because this will only cause status line not redrawing, which may consider acceptable. At some point we will have one I am sure, but if at all possible it could be better to investigate if that can be solved some other way. (Not exactly sure why that is public at all, given there seems to also be a factory function to be able to hide it?) If the Pluse() must not be overridden. I think the only way is to modify pkgDPkgPM::Go(), let it call some static method in PackageManagerFancy just after pselect(). However I'm not sure because I'm new to apt. Can we perhaps make it (still) easier to reproduce? apt can be told to use a different dpkg via Dir::Bin::dpkg – can we have that be a script which just spits out 'offending' lines (= is it the fd messages or the messages on stdout for example – or both) ? Yes, with temporary modifications to code. I added another patch, which will call malloc() in a tight loop while simulating. To reproduce the problem, apply that patch and send SIGWINCH to apt continuously, like: while :; do killall -SIGWINCH apt; done Then, run apt like: ./apt install -s gnome You will see apt crash in few seconds after install simulation started. (Sorry for not being very specific at the moment as I am a bit starved for Debian time; will try to give that some proper thought/review ~next week) Best regards David Kalnischkies Have a nice day :) Zhang BoyangFrom 4c8f9d5ab13f3069caefeada38658b72f0ad1782 Mon Sep 17 00:00:00 2001 From: Zhang Boyang Date: Thu, 2 Dec 2021 00:16:07 +0800 Subject: [PATCH 1/2] Make heavy use of malloc() while simulating --- apt-pkg/algorithms.cc | 10 ++ 1 file changed, 10 insertions(+) diff --git a/apt-pkg/algorithms.cc b/apt-pkg/algorithms.cc index fb0b7dca7..27bbe5490 100644 --- a/apt-pkg/algorithms.cc +++ b/apt-pkg/algorithms.cc @@ -119,6 +119,16 @@ bool pkgSimulate::Install(PkgIterator iPkg,string File) } bool pkgSimulate::RealInstall(PkgIterator iPkg,string /*File*/) { + // Use malloc() heavily, try trigger bug #852757 + for (int round = 1; round <= 10; round++) + { + vector arr; + for (int sz = 1; sz <= 100; sz++) + arr.push_back(malloc(sz)); + for (auto ptr: arr) + free(ptr); + } + // Adapt the iterator PkgIterator Pkg = Sim.FindPkg(iPkg.Name(), iPkg.Arch()); Flags[Pkg->ID] = 1; -- 2.30.2 From ff09c6cacb3c2b5878d44fb370bf460fd27f6ec3 Mon Sep 17 00:00:00 2001 From: Zhang Boyang Date: Thu, 2 Dec 2021 00:21:48 +0800 Subject: [PATCH 2/2] Fix incorrect SIGWINCH handling Previously, status line is redrawn in signal handler. However, the drawing code make heavy use of std::string and other syscalls, which may not be async-signal-safe. This will cause deadlock, overwritten errno, even silent memory corruption. This patch implemented Anders Kaseorg's idea. The signal handler will only set a flag, which is async-signal-safe, and actual redrawing will be deferred to PackageManagerFancy::Pulse(). Note that the virtual function PackageManagerFancy::Pulse() already exists in base class but newly overridden in PackageManagerFancy, so the ABI
Bug#852757: apt calls malloc inside SIGWINCH handler, leading to deadlock
Hi, On Thu, Nov 25, 2021 at 07:30:45PM +0800, Zhang Boyang wrote: > It seems my last email was marked spam. It's strange that the bug tracker > has received my reply but the mailing list didn't. Please correct me if I > misunderstand the situation. Indeed, looks like your mail was dropped at some point. If you supply sufficient details listmasters might be able to tell you what exactly went wrong. (The BTS seems to assign your mails ~ -10 points which is very good, but at the same time the list mail which passed through got 0, which is not bad, but also not particular good citing e.g. "MIME_BASE64_TEXT_BOGUS(1) R_DKIM_REJECT(1)". With patches you might be better of in salsa btw.) > The patch is attached again for convenience. This patch implemented Anders's > idea. The signal handler will only set a flag. After signal handler > completed, the pselect() in pkgDPkgPM::Go() will return immediately with > errno set to EINTR. Then, progress->Pluse() is called by pkgDPkgPM::Go(), > and PackageManagerFancy::Pluse() will redraw the status bar. Not having looked at the issue itself I just gave the patch a casual glance: PackageManagerFancy is a public – hence exported – interface, so you can't change member variables or add virtual methods as that breaks the ABI. At some point we will have one I am sure, but if at all possible it could be better to investigate if that can be solved some other way. (Not exactly sure why that is public at all, given there seems to also be a factory function to be able to hide it?) Can we perhaps make it (still) easier to reproduce? apt can be told to use a different dpkg via Dir::Bin::dpkg – can we have that be a script which just spits out 'offending' lines (= is it the fd messages or the messages on stdout for example – or both) ? (Sorry for not being very specific at the moment as I am a bit starved for Debian time; will try to give that some proper thought/review ~next week) Best regards David Kalnischkies signature.asc Description: PGP signature
Bug#852757: apt calls malloc inside SIGWINCH handler, leading to deadlock
Hello, It seems my last email was marked spam. It's strange that the bug tracker has received my reply but the mailing list didn't. Please correct me if I misunderstand the situation. My previous reply can be viewed at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852757 The patch is attached again for convenience. This patch implemented Anders's idea. The signal handler will only set a flag. After signal handler completed, the pselect() in pkgDPkgPM::Go() will return immediately with errno set to EINTR. Then, progress->Pluse() is called by pkgDPkgPM::Go(), and PackageManagerFancy::Pluse() will redraw the status bar. I looked around for similar bugs. I found #947279 might be the same bug, but I don't have much evidence. Best regards, Zhang BoyangFrom 6cdd7fe9476a8149bc5bf18f70f9a8a30ba92d3a Mon Sep 17 00:00:00 2001 From: Zhang Boyang Date: Wed, 24 Nov 2021 23:34:04 +0800 Subject: [PATCH] Fix incorrect SIGWINCH handling Previously, status line is redrawn in signal handler. However, the drawing code make heavy use of std::string and other syscalls, which may not be async-signal-safe. This will cause deadlock, overwritten errno, even silent memory corruption. With this patch, the signal handler will only set a flag, which is async-signal-safe, and actual redrawing will be deferred to Pulse(). Closes: #852757 --- apt-pkg/install-progress.cc | 31 +-- apt-pkg/install-progress.h | 8 +--- 2 files changed, 26 insertions(+), 13 deletions(-) diff --git a/apt-pkg/install-progress.cc b/apt-pkg/install-progress.cc index aadd28e51..99f16bffa 100644 --- a/apt-pkg/install-progress.cc +++ b/apt-pkg/install-progress.cc @@ -222,22 +222,22 @@ PackageManagerFancy::PackageManagerFancy() : d(NULL), child_pty(-1) { // setup terminal size - old_SIGWINCH = signal(SIGWINCH, PackageManagerFancy::staticSIGWINCH); - instances.push_back(this); + if (instances++ == 0) + old_SIGWINCH = signal(SIGWINCH, PackageManagerFancy::staticSIGWINCH); } -std::vector PackageManagerFancy::instances; +int PackageManagerFancy::instances = 0; +sighandler_t PackageManagerFancy::old_SIGWINCH; +volatile sig_atomic_t PackageManagerFancy::SIGWINCH_flag = 0; PackageManagerFancy::~PackageManagerFancy() { - instances.erase(find(instances.begin(), instances.end(), this)); - signal(SIGWINCH, old_SIGWINCH); + if (--instances == 0) + signal(SIGWINCH, old_SIGWINCH); } -void PackageManagerFancy::staticSIGWINCH(int signum) +void PackageManagerFancy::staticSIGWINCH(int) { - std::vector::const_iterator I; - for(I = instances.begin(); I != instances.end(); ++I) - (*I)->HandleSIGWINCH(signum); + SIGWINCH_flag = 1; } PackageManagerFancy::TermSize @@ -294,13 +294,24 @@ void PackageManagerFancy::SetupTerminalScrollArea(int nr_rows) } } -void PackageManagerFancy::HandleSIGWINCH(int) +void PackageManagerFancy::HandleSIGWINCH() { int const nr_terminal_rows = GetTerminalSize().rows; SetupTerminalScrollArea(nr_terminal_rows); DrawStatusLine(); } +void PackageManagerFancy::Pulse() +{ + if (SIGWINCH_flag) + { + SIGWINCH_flag = 0; + int errsv = errno; + HandleSIGWINCH(); + errno = errsv; + } +} + void PackageManagerFancy::Start(int a_child_pty) { child_pty = a_child_pty; diff --git a/apt-pkg/install-progress.h b/apt-pkg/install-progress.h index 94b66ed9b..d1dd3eb8a 100644 --- a/apt-pkg/install-progress.h +++ b/apt-pkg/install-progress.h @@ -125,12 +125,14 @@ namespace Progress { void * const d; private: APT_HIDDEN static void staticSIGWINCH(int); -static std::vector instances; +static int instances; +static sighandler_t old_SIGWINCH; +static volatile sig_atomic_t SIGWINCH_flag; APT_HIDDEN bool DrawStatusLine(); protected: void SetupTerminalScrollArea(int nr_rows); -void HandleSIGWINCH(int); +void HandleSIGWINCH(); typedef struct { int rows; @@ -138,12 +140,12 @@ namespace Progress { } TermSize; TermSize GetTerminalSize(); -sighandler_t old_SIGWINCH; int child_pty; public: PackageManagerFancy(); virtual ~PackageManagerFancy(); +virtual void Pulse() APT_OVERRIDE; virtual void Start(int child_pty=-1) APT_OVERRIDE; virtual void Stop() APT_OVERRIDE; virtual bool StatusChanged(std::string PackageName, -- 2.30.2
Bug#852757: apt calls malloc inside SIGWINCH handler, leading to deadlock
Control: notfound -1 1.4~beta4ubuntu1 Control: found -1 1.4~beta4 Hi, On Thu, Jan 26, 2017 at 08:15:30PM -0500, Anders Kaseorg wrote: > Package: apt > Version: 1.4~beta4ubuntu1 > > (I also checked the code of 1.4~beta4, hence reporting here.) Thanks! Next time please don't report with a ubuntu version string through as the BTS doesn't know ubuntu version and will hence be a bit confused. I am fixing this with the control headers above. > I just had an ‘apt install’ process freeze when I resized its terminal > window. Attaching gdb revealed the backtrace below, showing that free() > was interrupted by the SIGWINCH handler, which calls Configuration::FindB, > which builds the first argument of checkFindConfigOptionType with the > std::string(const char *) constructor, which tries to call malloc(), > leading to a deadlock. Mhhh. the checkFindConfigOptionType path is new, was added by me (in ~beta3) and there is actually some benefit in avoiding the std::string by default (as by default that method will do nothing. Whitelisting is at that point really a developer only mode), so I committed a simple optimisation in that direction eliminating that malloc… > Probably the most straightforward solution is to have the signal handler > do nothing other than set a volatile sig_atomic_t flag, and defer the real > work to some safe point outside the handler. … but while "fixing" this specific instance the handler is far from trivial with multiple std::string creations (although, those might qualify for small-string optimisation), output generation (which perhaps uses only safe operations internally, but who knows) and and and … (since at least 1.1) which is far from trivial to change as these progress handlers have no event loop by themselves. They are called in reaction to messages received from dpkg which can be far and apart. They especially don't happen together with "normal" output of dpkg (or the things it runs in maintainerscripts) so that output will be "happily" garbled if we don't react to the window change 'instantly'… I have no immediate idea at least but with my mentioned commit we are at least back to a pre-beta3 state which seemed to have worked for a while now… (unsurprisingly, I couldn't reproduce the problem in a few test runs before or after – so just try to be lucky, I guess ;) ). Best regards David Kalnischkies signature.asc Description: PGP signature
Bug#852757: apt calls malloc inside SIGWINCH handler, leading to deadlock
Package: apt Version: 1.4~beta4ubuntu1 (I also checked the code of 1.4~beta4, hence reporting here.) I just had an ‘apt install’ process freeze when I resized its terminal window. Attaching gdb revealed the backtrace below, showing that free() was interrupted by the SIGWINCH handler, which calls Configuration::FindB, which builds the first argument of checkFindConfigOptionType with the std::string(const char *) constructor, which tries to call malloc(), leading to a deadlock. malloc() is unsafe to call within a signal hander (https://www.securecoding.cert.org/confluence/display/c/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers). Probably the most straightforward solution is to have the signal handler do nothing other than set a volatile sig_atomic_t flag, and defer the real work to some safe point outside the handler. Anders #0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 #1 0x7fb4d0b6de82 in __GI___libc_malloc (bytes=140414575876864) at malloc.c:2923 #2 0x7fb4d1155af8 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 #3 0x7fb4d14ee54d in std::__cxx11::basic_string::_M_construct (this=0x7fffda2fe130, __beg=0x7fb4d15be6c1 "Debug::InstallProgress::Fancy", __end=) at /usr/include/c++/6/bits/basic_string.tcc:219 #4 0x7fb4d14f19b5 in std::__cxx11::basic_string ::_M_construct_aux ( __end=, __beg=0x7fb4d15be6c1 "Debug::InstallProgress::Fancy", this=0x7fffda2fe130) at /usr/include/c++/6/bits/basic_string.h:196 #5 std::__cxx11::basic_string ::_M_construct (__end=, __beg=0x7fb4d15be6c1 "Debug::InstallProgress::Fancy", this=0x7fffda2fe130) at /usr/include/c++/6/bits/basic_string.h:215 #6 std::__cxx11::basic_string ::basic_string (__a=..., __s=0x7fb4d15be6c1 "Debug::InstallProgress::Fancy", this=0x7fffda2fe130) at /usr/include/c++/6/bits/basic_string.h:456 #7 Configuration::FindB (this=0x561e68c59c20, Name=Name@entry=0x7fb4d15be6c1 "Debug::InstallProgress::Fancy", Default=Default@entry=@0x7fffda2fe19f: false) at ./apt-pkg/contrib/configuration.cc:446 #8 0x7fb4d1579700 in APT::Progress::PackageManagerFancy::GetTerminalSize ( this=this@entry=0x561e69344bf0) at ./apt-pkg/install-progress.cc:253 #9 0x7fb4d157a379 in APT::Progress::PackageManagerFancy::HandleSIGWINCH ( this=0x561e69344bf0) at ./apt-pkg/install-progress.cc:299 #10 0x7fb4d157a3bf in APT::Progress::PackageManagerFancy::staticSIGWINCH ( signum=28) at ./apt-pkg/install-progress.cc:240 #11 #12 0x7fb4d0b6a358 in _int_free (av=0x7fb4d0ea9b00 , p=0x561e69a6a890, have_lock=0) at malloc.c:4040 #13 0x7fb4d0b6e18c in __GI___libc_free (mem=) at malloc.c:2982 #14 0x7fb4d157c918 in metaIndex::~metaIndex (this=0x561e699f0100, __in_chrg=) at ./apt-pkg/metaindex.cc:45 #15 0x7fb4d1524ee9 in debReleaseIndex::~debReleaseIndex ( this=0x561e699f0100, __in_chrg=) at ./apt-pkg/deb/debmetaindex.cc:111 #16 0x7fb4d1534dd6 in debSLTypeDebian::CreateItemInternal ( this=0x7fb4d17f6800 <_apt_DebSrcType>, List=..., URI=..., Dist=..., Section="universe", IsSrc=IsSrc@entry=@0x7fffda2fee67: true, Options=std::map with 1 elements = {...}) at ./apt-pkg/deb/debmetaindex.cc:1156 #17 0x7fb4d1535005 in debSLTypeDebSrc::CreateItem (this=, List=..., URI=..., Dist=..., Section=..., Options=...) at ./apt-pkg/deb/debmetaindex.cc:1199 #18 0x7fb4d15a0cfd in pkgSourceList::Type::ParseLine ( this=0x7fb4d17f6800 <_apt_DebSrcType>, List=std::vector of length 9, capacity 16 = {...}, Buffer=, CurLine=18, File="/etc/apt/sources.list") at ./apt-pkg/sourcelist.cc:271 #19 0x7fb4d159e892 in pkgSourceList::ParseFileOldStyle ( this=this@entry=0x561e6974cf10, File="/etc/apt/sources.list") at ./apt-pkg/sourcelist.cc:415 #20 0x7fb4d15a4388 in pkgSourceList::ReadAppend ( this=this@entry=0x561e6974cf10, File="/etc/apt/sources.list") at ./apt-pkg/sourcelist.cc:364 #21 0x7fb4d15a4715 in pkgSourceList::ReadMainList ( this=this@entry=0x561e6974cf10) at ./apt-pkg/sourcelist.cc:319 #22 0x7fb4d14d68c6 in pkgCacheFile::BuildSourceList ( this=this@entry=0x7fffda2ff880) at ./apt-pkg/cachefile.cc:145 #23 0x7fb4d14d768b in pkgCacheFile::BuildCaches ( this=this@entry=0x7fffda2ff880, Progress=Progress@entry=0x0, WithLock=WithLock@entry=true) at ./apt-pkg/cachefile.cc:107 #24 0x7fb4d154c28e in pkgDPkgPM::Go (this=0x561e697fbac0, progress=) at ./apt-pkg/deb/dpkgpm.cc:2154 #25 0x7fb4d1587c4b in pkgPackageManager::DoInstallPostFork ( this=0x561e697fbac0, progress=0x561e69344bf0) at ./apt-pkg/packagemanager.cc:1154 #26 0x7fb4d181cb0d in InstallPackages(CacheFile&, bool, bool, bool) ()