Neal Richardson created ARROW-17974:
---------------------------------------

             Summary: [C++] random function can't actually be used
                 Key: ARROW-17974
                 URL: https://issues.apache.org/jira/browse/ARROW-17974
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Neal Richardson


random() is currently implemented as a nullary function. It doesn't let you 
specify the number of values you want to generate because it's designed to 
generate however many the given ExecBatch has. The only option RandomOptions 
takes seems to be an optional seed value. Unfortunately, the result is that the 
function is not usable, AFAICT.

Calling the compute function directly, you get 0 values (all examples from R): 

{code}
library(arrow)
call_function("random")
# Array
# <double>
# []
{code}

Calling it from within an ExecPlan, it errors because it is not a proper scalar 
function, despite what the filenames say (scalar_random.cc, etc.):

{code}
library(arrow)
library(dplyr)

mtcars %>% 
  arrow_table() %>% 
  mutate(x = arrow_random()) %>% 
  collect()
# Error in `collect()`:
# ! Invalid: ExecuteScalarExpression cannot Execute non-scalar expression 
Array[double]
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to