On Fri, Feb 19, 2021 at 01:06:43PM +0100, Otto Moerbeek wrote:
> On Fri, Feb 19, 2021 at 12:45:58PM +0100, Mark Kettenis wrote:
>
> > > Date: Fri, 19 Feb 2021 10:57:30 +0100
> > > From: Otto Moerbeek <[email protected]>
> > >
> > > Hi,
> > >
> > > working on PowerDNS Recursor, once in a while I'm seeing:
> > >
> > > #0 0x000009fd67ef09dc in
> > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > (this=<optimized out>,
> > > head=0x9fd67efc8e8 <libunwind::uwis_cache+8>, elm=0x9fca04be900)
> > > at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > 243 RB_GENERATE(CacheTree, CacheItem, entry, CacheCmp);
> > > [Current thread is 1 (process 349420)]
> > > (gdb) bt
> > > #0 0x000009fd67ef09dc in
> > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > (this=<optimized out>,
> > > head=0x9fd67efc8e8 <libunwind::uwis_cache+8>, elm=0x9fca04be900)
> > > at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > #1 0x000009fd67eeddef in
> > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT
> > > (this=<optimized out>,
> > > head=<optimized out>, elm=<optimized out>)
> > > at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > #2 libunwind::UnwindInfoSectionsCache::setUnwindInfoSectionsForPC
> > > (this=<optimized out>, key=10983975073074,
> > > uis=...) at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:237
> > > #3 libunwind::UnwindCursor<libunwind::LocalAddressSpace,
> > > libunwind::Registers_x86_64>::setInfoBasedOnIPRegister (
> > > this=0x9fd2ca0aa68, isReturnAddress=<optimized out>)
> > > at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:1891
> > > #4 0x000009fd67eedaa4 in
> > > libunwind::UnwindCursor<libunwind::LocalAddressSpace,
> > > libunwind::Registers_x86_64>::step (
> > > this=0x9fd2ca0aa68) at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:2031
> > > #5 0x000009fd67ef15a4 in unwind_phase1 (uc=<optimized out>,
> > > cursor=<optimized out>, exception_object=0x9fd37b24560)
> > > at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:46
> > > #6 _Unwind_RaiseException (exception_object=0x9fd37b24560)
> > > at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:363
> > > #7 0x000009fd67eeb533 in __cxa_throw (thrown_object=0x9fd37b24580,
> > > tinfo=0x9fa6c615a00 <typeinfo for PDNSException>, dest=<optimized
> > > out>)
> > > at
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libcxxabi/src/cxa_exception.cpp:279
> > > #8 0x000009fa6c295955 in ComboAddress::ComboAddress (this=<optimized
> > > out>, str=..., port=<optimized out>)
> > > at ./iputils.hh:219
> > > #9 0x000009fa6c489970 in startFrameStreamServers (config=...) at
> > > pdns_recursor.cc:1248
> > > #10 checkFrameStreamExport (luaconfsLocal=...) at pdns_recursor.cc:1290
> > > #11 0x000009fa6c48158f in recursorThread (n=<optimized out>,
> > > ...
> > >
> > > This does not happen always, most of the time this exception is
> > > handled correctly, afaik.
> > >
> > > The code that twrows an exception is:
> > > try {
> > > ComboAddress address(server);
> > > ...
> > > }
> > > catch ...
> > >
> > > The ComboAddress constructor throws the exception (and is supposed to
> > > do that). It looks like libunwind gets confused somehow.
> > >
> > > Any clue?
> >
> > The cache that pirofti@ added a while ago isn't thread-safe. Or maybe
> > this is a use-after free caused by dlcose(4). We should probably
> > disable/remove it while he is working on a better solution.
> > Unfortunately I don't think adding locking here is a good idea so this
> > may need a more fundamental rethink.
> >
> > Upstream did add an optimization in this area a few months ago:
> >
> >
> > https://github.com/llvm/llvm-project/commit/881aba7071c6e4cc2417e875ca5027ec7c0a92a3
> >
> > The version of libunwind we're using is older than that, so it may be
> > worth picking that up and see if that improves the original problem.
>
> First I'm going to try to fix it my making the cache thread_local.
>
> I'm probably going to regret looking at this code,
>
> -Otto
The diff below works for my test case on amd64.
It also feels right from a theoretical point of view. As for practical
matters, if thread local storage isn't working properly, a lot more
things would break. So I am a bit more optimisic about the diff now
than this morning.
So we have three options, I think:
- back the caching out,
- investigate the commit you mention above. Sadly I cannot
remember the original case that prompted for the caching code to be
added.
- continue on the thread_local path
We *have* to pick a solution. Having broken exception handling for
multi-threaded applications is no fun at all...
Otto
>
> Index: UnwindCursor.hpp
> ===================================================================
> RCS file: /cvs/src/gnu/llvm/libunwind/src/UnwindCursor.hpp,v
> retrieving revision 1.2
> diff -u -p -r1.2 UnwindCursor.hpp
> --- UnwindCursor.hpp 2 Jan 2021 01:10:02 -0000 1.2
> +++ UnwindCursor.hpp 19 Feb 2021 12:05:26 -0000
> @@ -75,7 +75,7 @@ extern "C" _Unwind_Reason_Code __libunwi
>
> namespace libunwind {
>
> -static UnwindInfoSectionsCache uwis_cache;
> +static thread_local UnwindInfoSectionsCache uwis_cache;
>
> #if defined(_LIBUNWIND_SUPPORT_DWARF_UNWIND)
> /// Cache of recently found FDEs.
>