[jira] [Created] (ARROW-17624) [C++][Acero] Window Functions add helper classes for frame calculation

2022-09-05 Thread Michal Nowakiewicz (Jira)
Michal Nowakiewicz created ARROW-17624:
--

 Summary: [C++][Acero] Window Functions add helper classes for 
frame calculation
 Key: ARROW-17624
 URL: https://issues.apache.org/jira/browse/ARROW-17624
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 10.0.0
Reporter: Michal Nowakiewicz
Assignee: Michal Nowakiewicz
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17623) [C++][Acero] Window Functions add helper classes for ranking

2022-09-05 Thread Michal Nowakiewicz (Jira)
Michal Nowakiewicz created ARROW-17623:
--

 Summary: [C++][Acero] Window Functions add helper classes for 
ranking
 Key: ARROW-17623
 URL: https://issues.apache.org/jira/browse/ARROW-17623
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 10.0.0
Reporter: Michal Nowakiewicz
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17622) [C++] Order-aware non-sink Fetch Node

2022-09-05 Thread Vibhatha Lakmal Abeykoon (Jira)
Vibhatha Lakmal Abeykoon created ARROW-17622:


 Summary: [C++] Order-aware non-sink Fetch Node
 Key: ARROW-17622
 URL: https://issues.apache.org/jira/browse/ARROW-17622
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Vibhatha Lakmal Abeykoon
Assignee: Vibhatha Lakmal Abeykoon


Considering the existing sink nodes and newly introduced Fetch node with sort 
capability, we will only need two nodes, "sort", and "fetch" in the long run. 
Because once the ordered execution is integrated, some features could be 
removed. Right now, there are three nodes doing somewhat closely related things 
which is redundant work assuming unordered execution. Namely they are, 
"order_by_sink", "fetch_sink", and "select_k_sink". So one of them will need to 
go away at some point and all of them will no longer be sink nodes and sorting 
behavior will need to be removed from "fetch".

The task breakdown needs to be determined. Better to keep a few sub-tasks. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17621) [CI] Audit workflows

2022-09-05 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-17621:
--

 Summary: [CI] Audit workflows
 Key: ARROW-17621
 URL: https://issues.apache.org/jira/browse/ARROW-17621
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Jacob Wujciak-Jens
Assignee: Jacob Wujciak-Jens
 Fix For: 10.0.0


Set minimal permissions for token, check for out-dated actions, pin shas etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17620) [R] as_arrow_array() ignores type argument for StructArrays

2022-09-05 Thread Jira
François Michonneau created ARROW-17620:
---

 Summary: [R] as_arrow_array() ignores type argument for 
StructArrays
 Key: ARROW-17620
 URL: https://issues.apache.org/jira/browse/ARROW-17620
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: François Michonneau


While `Arrow$create()` respects the types provided by the `type` argument, they 
are ignored when using `as_arrow_array()`. Compare the output below:

 
{code:java}
library(arrow, warn.conflicts = FALSE)

dataset <- data.frame(
a = 1,
b = 2,
c = 3
)

types <- struct(a = int16(), b = int32(), c = int64())

as_arrow_array(
dataset, 
type = types
)$type
#> StructType
#> struct

Array$create(
dataset, 
type = types
)$type
#> StructType
#> struct{code}

I have identified the bug and will submit a PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17619) [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer

2022-09-05 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17619:
--

 Summary: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet 
writer
 Key: ARROW-17619
 URL: https://issues.apache.org/jira/browse/ARROW-17619
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


PARQUET-492 added DELTA_BYTE_ARRAY decoder, but we don't have an encoder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17618) [Doc] Add Flight SQL to implementation status page

2022-09-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17618:
--

 Summary: [Doc] Add Flight SQL to implementation status page
 Key: ARROW-17618
 URL: https://issues.apache.org/jira/browse/ARROW-17618
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Reporter: Antoine Pitrou


At some point, we should probably add a dedicated section for Flight SQL to 
https://arrow.apache.org/docs/dev/status.html





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17617) [Doc] Remove experimental marker for Flight RPC in feature matrix

2022-09-05 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17617:
--

 Summary: [Doc] Remove experimental marker for Flight RPC in 
feature matrix
 Key: ARROW-17617
 URL: https://issues.apache.org/jira/browse/ARROW-17617
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation
Reporter: Antoine Pitrou
 Fix For: 10.0.0


In https://arrow.apache.org/docs/dev/status.html, Flight RPC is still marked 
experimental. We should probably remove that mention as it was already removed 
from the corresponding format specification page 
(https://arrow.apache.org/docs/dev/format/Flight.html)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-julia] svilupp opened a new issue, #335: Inconsistent handling of eltype Decimals.Decimal (with silent errors?)

2022-09-05 Thread GitBox


svilupp opened a new issue, #335:
URL: https://github.com/apache/arrow-julia/issues/335

   First of all, thank you for the amazing package! I have noticed unexpected 
behaviour that I wanted to point out.
   
   **Expected behaviour:** rational numbers like 1.0 and 0.1 will be 
represented as Float; they can be saved and loaded again.
   
   **Actual behaviour:** 
   When writing column with eltype Decimals.Decimal, `Arrow.write(filename,df)` 
will give a method error (see below) and 
`Arrow.write(filename,df;compress=:lz4)` will complete without an error, but 
the resulting table is wrong when re-read (see MWE below).
   
   I've had a quick look at the code base and I cannot see any type checks - 
are those left to the user / MethodErrors?
   
   MWE:
   ```
   using Decimals
   using DataFrames, Arrow
   
   df=DataFrame(:a=>[Decimal(2.0)])
   
   # this will fail with error that Decimal cannot be saved
   Arrow.write("test.feather", df)
   # nested task error: MethodError: no method matching write(::IOBuffer, 
::Decimals.Decimal)
   
   # this will succeed
   Arrow.write("test.feather", df;compress=:lz4)
   
   # but the loaded dataframe will be rubbish
   df2=Arrow.Table("test.feather")|>DataFrame
   # 1×1 DataFrame
   #  Row │ a
   #  │ Float64
   # ─┼─
   #1 │ 2.1509e-314
   
   ```
   
   Error stack trace from Arrow.write() without a keyword argument:
   > ERROR: TaskFailedException
   Stacktrace:
[1] wait
  @ ./task.jl:345 [inlined]
[2] close(writer::Arrow.Writer{IOStream})
  @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:230
[3] open(::Arrow.var"#120#121"{DataFrame}, ::Type, ::Vararg{Any}; 
kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:file,), 
Tuple{Bool}}})
  @ Base ./io.jl:386
[4] #write#119
  @ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:57 [inlined]
[5] write(file_path::String, tbl::DataFrame)
  @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:56
[6] top-level scope
  @ REPL[14]:1
   > nested task error: MethodError: no method matching write(::IOBuffer, 
::Decimals.Decimal)
   Closest candidates are:
 write(::IO, ::Any) at io.jl:672
 write(::IO, ::Any, ::Any...) at io.jl:673
 write(::Base.GenericIOBuffer, ::UInt8) at iobuffer.jl:442
 ...
   Stacktrace:
[1] write(io::IOBuffer, x::Decimals.Decimal)
  @ Base ./io.jl:672
[2] writearray(io::IOStream, #unused#::Type{Decimals.Decimal}, 
col::Vector{Union{Missing, Decimals.Decimal}})
  @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/utils.jl:50
[3] writebuffer(io::IOStream, col::Arrow.Primitive{Union{Missing, 
Decimals.Decimal}, Vector{Union{Missing, Decimals.Decimal}}}, alignment::Int64)
  @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/arraytypes/primitive.jl:102
[4] write(io::IOStream, msg::Arrow.Message, 
blocks::Tuple{Vector{Arrow.Block}, Vector{Arrow.Block}}, 
sch::Base.RefValue{Tables.Schema}, alignment::Int64)
  @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:365
[5] macro expansion
  @ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:149 [inlined]
[6] (::Arrow.var"#122#124"{IOStream, Int64, Tuple{Vector{Arrow.Block}, 
Vector{Arrow.Block}}, Base.RefValue{Tables.Schema}, 
Arrow.OrderedChannel{Arrow.Message}})()
  @ Arrow ./threadingconstructs.jl:258
   
   
   **Package version**
 [69666777] Arrow v2.3.0
 [a93c6f00] DataFrames v1.3.4
 [194296ae] LibPQ v1.14.0
   
   **versioninfo()** (but it was the same on 1.7)
   Julia Version 1.8.0
   Commit 5544a0fab76 (2022-08-17 13:38 UTC)
   Platform Info:
   OS: macOS (arm64-apple-darwin21.3.0)
   CPU: 8 × Apple M1 Pro
   WORD_SIZE: 64
   LIBM: libopenlibm
   LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
   Threads: 6 on 6 virtual cores


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17616) [CI][Java] Java nightly upload job fails after introduction of pruning

2022-09-05 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-17616:
--

 Summary: [CI][Java] Java nightly upload job fails after 
introduction of pruning
 Key: ARROW-17616
 URL: https://issues.apache.org/jira/browse/ARROW-17616
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Java
Reporter: Jacob Wujciak-Jens


The nightly java upload job has been failing ever since [ARROW-17293].
https://github.com/apache/arrow/actions/workflows/java_nightly.yml

It looks like the "Build Repository" step clashes with the synced repo?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package

2022-09-05 Thread Jira
Raúl Cumplido created ARROW-17615:
-

 Summary: [CI][Packaging] arrow-cpp on conda nightlies fail finding 
Arrow package
 Key: ARROW-17615
 URL: https://issues.apache.org/jira/browse/ARROW-17615
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Raúl Cumplido
Assignee: Raúl Cumplido


Trying to find arrow package using our current nightly arrow-cpp packange on 
conda raises the following:
{code:java}
$ cmake . -DCMAKE_BUILD_TYPE=Release
-- The C compiler identification is GNU 10.4.0
-- The CXX compiler identification is GNU 10.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - 
skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: 
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ - 
skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at CMakeLists.txt:10 (find_package):
  Found package configuration file:    
/home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake 
 but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT
  FOUND.
-- Configuring incomplete, errors occurred!
See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log".
{code}
The CMakeLists.txt file to reproduce is:
{code:java}
cmake_minimum_required(VERSION 3.19)
project(arrow-test)set(CMAKE_CXX_STANDARD 17)
if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++")
endif()
# Add Arrow
find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code}
The conda package was created with the following environment:
{code:java}
name: cookbook-cpp-dev
channels:
  - arrow-nightlies
  - conda-forge
dependencies:
  - python=3.9
  - compilers
  - arrow-nightlies::arrow-cpp >9
  - sphinx
  - gtest
  - gmock
  - arrow-nightlies::pyarrow >9
  - clang-tools
{code}
The compilation is successful when using arrow-cpp 9.0.0 from conda-forge 
instead of using the arrow-nightlies channel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17614) [CI][Python] test test_write_dataset_max_rows_per_file is producing several nightly build failures

2022-09-05 Thread Jira
Raúl Cumplido created ARROW-17614:
-

 Summary: [CI][Python] test test_write_dataset_max_rows_per_file is 
producing several nightly build failures
 Key: ARROW-17614
 URL: https://issues.apache.org/jira/browse/ARROW-17614
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Python
Reporter: Raúl Cumplido


The following failure has been seen on multiple nightly builds:
{code:java}
_ test_write_dataset_max_rows_per_file 
_tempdir = 
PosixPath('/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0')    
@pytest.mark.parquet
    def test_write_dataset_max_rows_per_file(tempdir):
        directory = tempdir / 'ds'
        max_rows_per_file = 10
        max_rows_per_group = 10
        num_of_columns = 2
        num_of_records = 35
    
        record_batch = _generate_data_and_columns(num_of_columns,
                                                  num_of_records)
    
        ds.write_dataset(record_batch, directory, format="parquet",
                         max_rows_per_file=max_rows_per_file,
>                        
> max_rows_per_group=max_rows_per_group)usr/local/lib/python3.7/site-packages/pyarrow/tests/test_dataset.py:3921:
>  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
usr/local/lib/python3.7/site-packages/pyarrow/dataset.py:992: in write_dataset
    min_rows_per_group, max_rows_per_group, create_dir
pyarrow/_dataset.pyx:2811: in pyarrow._dataset._filesystemdataset_write
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   FileNotFoundError: [Errno 2] Failed to open local file 
'/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0/ds/part-1.parquet'.
 Detail: [errno 2] No such file or directory {code}
Example of failed builds:

[verify-rc-source-python-macos-conda-amd64|https://github.com/ursacomputing/crossbow/runs/8176702861?check_suite_focus=true]

[wheel-manylinux2014-cp37-amd64|https://github.com/ursacomputing/crossbow/runs/8175319639?check_suite_focus=true]

It seems flaky as there were some nightly jobs executed on a previous day 
without new commits that were successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17613) [C++] Add function execution API for a preconfigured kernel

2022-09-05 Thread Yaron Gvili (Jira)
Yaron Gvili created ARROW-17613:
---

 Summary: [C++] Add function execution API for a preconfigured 
kernel
 Key: ARROW-17613
 URL: https://issues.apache.org/jira/browse/ARROW-17613
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Yaron Gvili
Assignee: Yaron Gvili


Currently, the function execution API goes through kernel selection on each 
invocation. This issue will add a faster-path for executing a preconfigured 
kernel.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)