Neal Richardson created ARROW-9380:
--------------------------------------

             Summary: [C++] Segfaults in compute::CallFunction
                 Key: ARROW-9380
                 URL: https://issues.apache.org/jira/browse/ARROW-9380
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Neal Richardson


I triggered these from R, so that's what the reproducers are in.

1. Calling "filter" with no args segfaults.

{code:r}
arrow:::compute__CallFunction("filter", list(), list(keep_na = FALSE))
{code}

Top of the backtrace from lldb:

{code}
  * frame #0: 0x0000000109e1c2c7 libarrow.100.dylib`arrow::Datum::type() const 
+ 7
    frame #1: 0x000000010a14a232 
libarrow.100.dylib`arrow::compute::internal::(anonymous 
namespace)::FilterMetaFunction::ExecuteImpl(std::__1::vector<arrow::Datum, 
std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions 
const*, arrow::compute::ExecContext*) const + 66
    frame #2: 0x0000000109fc32c9 
libarrow.100.dylib`arrow::compute::MetaFunction::Execute(std::__1::vector<arrow::Datum,
 std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions 
const*, arrow::compute::ExecContext*) const + 41
    frame #3: 0x0000000109fb3d3c 
libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, 
std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, 
arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844
    frame #4: 0x0000000109fb3c47 
libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, 
std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, 
arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 599
{code}

This is not the case with at least some other functions. If I try to call "sum" 
with no args, I get {{Invalid: Function accepts 1 arguments but passed 0}} and 
no segfault.

2. Something is strange with is_null. It creates what appears to be a valid 
boolean array, but if I pass it to filter, it segfaults. I'm adding bindings 
for this in ARROW-9187 but this should run on current master:

{code:r}
library(arrow)
a <- Array$create(1:4)
b <- arrow:::shared_ptr(Array, arrow:::call_function("is_null", a))
a$Filter(b)
{code}

Backtrace:

{code}
 * frame #0: 0x000000010a120bb6 
libarrow.100.dylib`arrow::compute::internal::GetFilterOutputSize(arrow::ArrayData
 const&, arrow::compute::FilterOptions::NullSelectionBehavior) + 38
    frame #1: 0x000000010a125659 
libarrow.100.dylib`arrow::compute::internal::(anonymous 
namespace)::PrimitiveFilter(arrow::compute::KernelContext*, 
arrow::compute::ExecBatch const&, arrow::Datum*) + 121
    frame #2: 0x0000000109fbbea4 
libarrow.100.dylib`arrow::compute::detail::VectorExecutor::ExecuteBatch(arrow::compute::ExecBatch
 const&, arrow::compute::detail::ExecListener*) + 996
    frame #3: 0x0000000109fba3e6 
libarrow.100.dylib`arrow::compute::detail::VectorExecutor::Execute(std::__1::vector<arrow::Datum,
 std::__1::allocator<arrow::Datum> > const&, 
arrow::compute::detail::ExecListener*) + 150
    frame #4: 0x0000000109fc0948 
libarrow.100.dylib`arrow::compute::Function::Execute(std::__1::vector<arrow::Datum,
 std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions 
const*, arrow::compute::ExecContext*) const + 1016
    frame #5: 0x0000000109fb3d3c 
libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, 
std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, 
arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844
    frame #6: 0x000000010a14a9b5 
libarrow.100.dylib`arrow::compute::internal::(anonymous 
namespace)::FilterMetaFunction::ExecuteImpl(std::__1::vector<arrow::Datum, 
std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions 
const*, arrow::compute::ExecContext*) const + 1989
    frame #7: 0x0000000109fc32c9 
libarrow.100.dylib`arrow::compute::MetaFunction::Execute(std::__1::vector<arrow::Datum,
 std::__1::allocator<arrow::Datum> > const&, arrow::compute::FunctionOptions 
const*, arrow::compute::ExecContext*) const + 41
    frame #8: 0x0000000109fb3d3c 
libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, 
std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, 
arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 844
    frame #9: 0x0000000109fb3c47 
libarrow.100.dylib`arrow::compute::CallFunction(std::__1::basic_string<char, 
std::__1::char_traits<char>, std::__1::allocator<char> > const&, 
std::__1::vector<arrow::Datum, std::__1::allocator<arrow::Datum> > const&, 
arrow::compute::FunctionOptions const*, arrow::compute::ExecContext*) + 599
{code}

BUT: if I call {{as.vector}} on {{b}} before using it as a Filter, it 
works--even though I've discarded the as.vector result and am still using the 
Array to filter. 

{code:r}
library(arrow)
a <- Array$create(1:4)
b <- arrow:::shared_ptr(Array, arrow:::call_function("is_null", a))
as.vector(b)
a$Filter(b)
{code}

Just printing (calling {{ToString}}) on {{b}} doesn't prevent the segfault. And 
I have not observed this with other boolean kernels. E.g. this does not 
segfault:

{code:r}
library(arrow)
a <- Array$create(1:4)
b <- arrow:::shared_ptr(Array, arrow:::call_function("greater", a, 
Scalar$create(3L)))
a$Filter(b)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to