Neal Richardson created ARROW-9380: -------------------------------------- Summary: [C++] Segfaults in compute::CallFunction Key: ARROW-9380 URL: https://issues.apache.org/jira/browse/ARROW-9380 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Neal Richardson
I triggered these from R, so that's what the reproducers are in. 1. Calling "filter" with no args segfaults. {code:r} arrow:::compute__CallFunction("filter", list(), list(keep_na = FALSE)) {code} Top of the backtrace from lldb: {code} * frame #0: 0x0000000109e1c2c7 libarrow.100.dylib`arrow::Datum::type() const + 7 frame #1: 0x000000010a14a232 libarrow.100.dylib`arrow::compute::internal::(anonymous namespace)::FilterMetaFunction::ExecuteImpl(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 66 frame #2: 0x0000000109fc32c9 libarrow.100.dylib`arrow::compute::MetaFunction::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 41 frame #3: 0x0000000109fb3d3c libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844 frame #4: 0x0000000109fb3c47 libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 599 {code} This is not the case with at least some other functions. If I try to call "sum" with no args, I get {{Invalid: Function accepts 1 arguments but passed 0}} and no segfault. 2. Something is strange with is_null. It creates what appears to be a valid boolean array, but if I pass it to filter, it segfaults. I'm adding bindings for this in ARROW-9187 but this should run on current master: {code:r} library(arrow) a <- Array$create(1:4) b <- arrow:::shared_ptr(Array, arrow:::call_function("is_null", a)) a$Filter(b) {code} Backtrace: {code} * frame #0: 0x000000010a120bb6 libarrow.100.dylib`arrow::compute::internal::GetFilterOutputSize(arrow::ArrayData const&, arrow::compute::FilterOptions::NullSelectionBehavior) + 38 frame #1: 0x000000010a125659 libarrow.100.dylib`arrow::compute::internal::(anonymous namespace)::PrimitiveFilter(arrow::compute::KernelContext*, arrow::compute::ExecBatch const&, arrow::Datum*) + 121 frame #2: 0x0000000109fbbea4 libarrow.100.dylib`arrow::compute::detail::VectorExecutor::ExecuteBatch(arrow::compute::ExecBatch const&, arrow::compute::detail::ExecListener*) + 996 frame #3: 0x0000000109fba3e6 libarrow.100.dylib`arrow::compute::detail::VectorExecutor::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::detail::ExecListener*) + 150 frame #4: 0x0000000109fc0948 libarrow.100.dylib`arrow::compute::Function::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 1016 frame #5: 0x0000000109fb3d3c libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844 frame #6: 0x000000010a14a9b5 libarrow.100.dylib`arrow::compute::internal::(anonymous namespace)::FilterMetaFunction::ExecuteImpl(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 1989 frame #7: 0x0000000109fc32c9 libarrow.100.dylib`arrow::compute::MetaFunction::Execute(std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) const + 41 frame #8: 0x0000000109fb3d3c libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844 frame #9: 0x0000000109fb3c47 libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 599 {code} BUT: if I call {{as.vector}} on {{b}} before using it as a Filter, it works--even though I've discarded the as.vector result and am still using the Array to filter. {code:r} library(arrow) a <- Array$create(1:4) b <- arrow:::shared_ptr(Array, arrow:::call_function("is_null", a)) as.vector(b) a$Filter(b) {code} Just printing (calling {{ToString}}) on {{b}} doesn't prevent the segfault. And I have not observed this with other boolean kernels. E.g. this does not segfault: {code:r} library(arrow) a <- Array$create(1:4) b <- arrow:::shared_ptr(Array, arrow:::call_function("greater", a, Scalar$create(3L))) a$Filter(b) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)