[ 
https://issues.apache.org/jira/browse/KUDU-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835086#comment-17835086
 ] 

Alexey Serbin commented on KUDU-3545:
-------------------------------------

I haven't tried to track down the exact root cause behind the crash, but I 
suspect the root cause is something described in KUDU-2068, i.e. ABI 
incompatibilities between GCC toolchains of different versions.

In essence, Kudu's third-party CLANG (used to generate {{precompiled.ll}}) 
picks up a toolchain of the latest version available at the build machine, but 
the rest of Kudu is built with a toolchain of different version (e.g., think of 
GCC7-based and GCC13-based toolchains on SLES15).  If there is an ABI 
incompatibility on the size of an STL-based type or anything else that's being 
passing between auto-generated code derived from {{precompiled.ll}} and the 
rest of the {{kudu-tserver}} runtime, there is a risk of either a memory 
corruption or, if you are lucky, an immediate crash of the {{kudu-tserver}} 
process or even a crash of the {{codegen-test}}.

The CLANG's behavior of picking up the latest available version of GCC 
toolchain that it can find is described in [its 
documentation|https://clang.llvm.org/docs/ClangCommandLineReference.html#dumping-preprocessor-state],
 see the paragraph for the {{\-\-gcc-toolchain}} option.  In newer versions of 
CLANG (starting with 16.0.0) there is a better alternative to the 
{{\-\-gcc-toolchain}} flag: {{\-\-gcc-install-dir}} (see [this e-mail 
thread|https://discourse.llvm.org/t/add-gcc-install-dir-deprecate-gcc-toolchain-and-remove-gcc-install-prefix/65091]
 for more details).  I guess we should employ this option once we upgrade 
Kudu's thirdparty LLVM at least to 16.0.0 version or newer (it's 11.0.0 as of 
April 2024).

> codegen test fails on SLES with higher libgcc version
> -----------------------------------------------------
>
>                 Key: KUDU-3545
>                 URL: https://issues.apache.org/jira/browse/KUDU-3545
>             Project: Kudu
>          Issue Type: Bug
>          Components: codegen
>            Reporter: Ashwani Raina
>            Priority: Minor
>
> On a SLES 15 withlibgcc_s1-13.2.1+git7813-150000.1.6.1.x86_64 version, 
> codegen-test fails with following crash:
> +++
> *** SIGABRT (@0x3162e) received by PID 202286 (TID 0x7f71d1bfe700) from PID 
> 202286; stack trace: ***
>     @     0x7f71d41f5910 (unknown)
>     @     0x7f71d2725d2b __GI_raise
>     @     0x7f71d27273e5 __GI_abort
>     @     0x7f71d28d78d7 (unknown)
>     @     0x7f71d28f1009 __deregister_frame
>     @     0x7f71d4d6c9e0 llvm::RTDyldMemoryManager::deregisterEHFrames()
>     @     0x7f71d4976b02 llvm::MCJIT::~MCJIT()
>     @     0x7f71d4977241 llvm::MCJIT::~MCJIT()
>     @     0x7f71d481c222 std::default_delete<>::operator()()
>     @     0x7f71d481c12d std::unique_ptr<>::~unique_ptr()
>     @     0x7f71d481bfaf kudu::codegen::JITWrapper::~JITWrapper()
>     @     0x7f71d4835f34 
> kudu::codegen::RowProjectorFunctions::~RowProjectorFunctions()
>     @     0x7f71d4835f50 
> kudu::codegen::RowProjectorFunctions::~RowProjectorFunctions()
>     @           0x46297c kudu::RefCountedThreadSafe<>::DeleteInternal()
>     @           0x45f3d1 kudu::DefaultRefCountedThreadSafeTraits<>::Destruct()
>     @           0x45acb0 kudu::RefCountedThreadSafe<>::Release()
>     @     0x7f71d480c191 
> kudu::codegen::CodeCache::EvictionCallback::EvictedEntry()
>     @     0x7f71d3c5e4bb kudu::(anonymous 
> namespace)::CacheShard<>::FreeEntry()
>     @     0x7f71d3c60b31 kudu::(anonymous namespace)::CacheShard<>::Insert()
>     @     0x7f71d3c5fb73 kudu::(anonymous namespace)::ShardedCache<>::Insert()
>     @     0x7f71d480bab6 kudu::codegen::CodeCache::AddEntry()
>     @     0x7f71d4811fea kudu::codegen::(anonymous 
> namespace)::CompilationTask::RunWithStatus()
>     @     0x7f71d4811a64 kudu::codegen::(anonymous 
> namespace)::CompilationTask::Run()
>     @     0x7f71d481288a 
> _ZZN4kudu7codegen18CompilationManager19RequestRowProjectorEPKNS_6SchemaES4_PSt10unique_ptrINS0_12RowProjectorESt14default_deleteIS6_EEENKUlvE_clEv
>     @     0x7f71d4813e72 
> _ZNSt17_Function_handlerIFvvEZN4kudu7codegen18CompilationManager19RequestRowProjectorEPKNS1_6SchemaES6_PSt10unique_ptrINS2_12RowProjectorESt14default_deleteIS8_EEEUlvE_E9_M_invokeERKSt9_Any_data
>     @           0x452430 std::function<>::operator()()
>     @     0x7f71d3d98648 kudu::ThreadPool::DispatchThread()
>     @     0x7f71d3d98ee9 _ZZN4kudu10ThreadPool12CreateThreadEvENKUlvE_clEv
>     @     0x7f71d3d9a6a0 
> _ZNSt17_Function_handlerIFvvEZN4kudu10ThreadPool12CreateThreadEvEUlvE_E9_M_invokeERKSt9_Any_data
>     @           0x452430 std::function<>::operator()()
>     @     0x7f71d3d89482 kudu::Thread::SuperviseThread()
>     @     0x7f71d41e96ea start_thread
> +++
> From the stack frame, it seems that __deregister_frame is probably being fed 
> some invalid input that is already de-initialised before calling the 
> __deregister_frame.
> We seem to be hitting this assert:
> [https://github.com/gcc-mirror/gcc/blob/65e2c932019b4e36d7c1d49952dc006fa7419a3d/libgcc/unwind-dw2-fde.c#L291C11-L291C11]
> gcc_assert (in_shutdown || ob);



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to