[jira] [Created] (ARROW-17624) [C++][Acero] Window Functions add helper classes for frame calculation
Michal Nowakiewicz created ARROW-17624: -- Summary: [C++][Acero] Window Functions add helper classes for frame calculation Key: ARROW-17624 URL: https://issues.apache.org/jira/browse/ARROW-17624 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 10.0.0 Reporter: Michal Nowakiewicz Assignee: Michal Nowakiewicz Fix For: 10.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17623) [C++][Acero] Window Functions add helper classes for ranking
Michal Nowakiewicz created ARROW-17623: -- Summary: [C++][Acero] Window Functions add helper classes for ranking Key: ARROW-17623 URL: https://issues.apache.org/jira/browse/ARROW-17623 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 10.0.0 Reporter: Michal Nowakiewicz Fix For: 10.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17622) [C++] Order-aware non-sink Fetch Node
Vibhatha Lakmal Abeykoon created ARROW-17622: Summary: [C++] Order-aware non-sink Fetch Node Key: ARROW-17622 URL: https://issues.apache.org/jira/browse/ARROW-17622 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Vibhatha Lakmal Abeykoon Assignee: Vibhatha Lakmal Abeykoon Considering the existing sink nodes and newly introduced Fetch node with sort capability, we will only need two nodes, "sort", and "fetch" in the long run. Because once the ordered execution is integrated, some features could be removed. Right now, there are three nodes doing somewhat closely related things which is redundant work assuming unordered execution. Namely they are, "order_by_sink", "fetch_sink", and "select_k_sink". So one of them will need to go away at some point and all of them will no longer be sink nodes and sorting behavior will need to be removed from "fetch". The task breakdown needs to be determined. Better to keep a few sub-tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17621) [CI] Audit workflows
Jacob Wujciak-Jens created ARROW-17621: -- Summary: [CI] Audit workflows Key: ARROW-17621 URL: https://issues.apache.org/jira/browse/ARROW-17621 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Jacob Wujciak-Jens Assignee: Jacob Wujciak-Jens Fix For: 10.0.0 Set minimal permissions for token, check for out-dated actions, pin shas etc. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17620) [R] as_arrow_array() ignores type argument for StructArrays
François Michonneau created ARROW-17620: --- Summary: [R] as_arrow_array() ignores type argument for StructArrays Key: ARROW-17620 URL: https://issues.apache.org/jira/browse/ARROW-17620 Project: Apache Arrow Issue Type: Bug Components: R Reporter: François Michonneau While `Arrow$create()` respects the types provided by the `type` argument, they are ignored when using `as_arrow_array()`. Compare the output below: {code:java} library(arrow, warn.conflicts = FALSE) dataset <- data.frame( a = 1, b = 2, c = 3 ) types <- struct(a = int16(), b = int32(), c = int64()) as_arrow_array( dataset, type = types )$type #> StructType #> struct Array$create( dataset, type = types )$type #> StructType #> struct{code} I have identified the bug and will submit a PR. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17619) [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer
Rok Mihevc created ARROW-17619: -- Summary: [C++][Parquet] Add DELTA_BYTE_ARRAY encoder to Parquet writer Key: ARROW-17619 URL: https://issues.apache.org/jira/browse/ARROW-17619 Project: Apache Arrow Issue Type: New Feature Components: C++, Parquet Reporter: Rok Mihevc PARQUET-492 added DELTA_BYTE_ARRAY decoder, but we don't have an encoder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17618) [Doc] Add Flight SQL to implementation status page
Antoine Pitrou created ARROW-17618: -- Summary: [Doc] Add Flight SQL to implementation status page Key: ARROW-17618 URL: https://issues.apache.org/jira/browse/ARROW-17618 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Antoine Pitrou At some point, we should probably add a dedicated section for Flight SQL to https://arrow.apache.org/docs/dev/status.html -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17617) [Doc] Remove experimental marker for Flight RPC in feature matrix
Antoine Pitrou created ARROW-17617: -- Summary: [Doc] Remove experimental marker for Flight RPC in feature matrix Key: ARROW-17617 URL: https://issues.apache.org/jira/browse/ARROW-17617 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Antoine Pitrou Fix For: 10.0.0 In https://arrow.apache.org/docs/dev/status.html, Flight RPC is still marked experimental. We should probably remove that mention as it was already removed from the corresponding format specification page (https://arrow.apache.org/docs/dev/format/Flight.html) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-julia] svilupp opened a new issue, #335: Inconsistent handling of eltype Decimals.Decimal (with silent errors?)
svilupp opened a new issue, #335: URL: https://github.com/apache/arrow-julia/issues/335 First of all, thank you for the amazing package! I have noticed unexpected behaviour that I wanted to point out. **Expected behaviour:** rational numbers like 1.0 and 0.1 will be represented as Float; they can be saved and loaded again. **Actual behaviour:** When writing column with eltype Decimals.Decimal, `Arrow.write(filename,df)` will give a method error (see below) and `Arrow.write(filename,df;compress=:lz4)` will complete without an error, but the resulting table is wrong when re-read (see MWE below). I've had a quick look at the code base and I cannot see any type checks - are those left to the user / MethodErrors? MWE: ``` using Decimals using DataFrames, Arrow df=DataFrame(:a=>[Decimal(2.0)]) # this will fail with error that Decimal cannot be saved Arrow.write("test.feather", df) # nested task error: MethodError: no method matching write(::IOBuffer, ::Decimals.Decimal) # this will succeed Arrow.write("test.feather", df;compress=:lz4) # but the loaded dataframe will be rubbish df2=Arrow.Table("test.feather")|>DataFrame # 1×1 DataFrame # Row │ a # │ Float64 # ─┼─ #1 │ 2.1509e-314 ``` Error stack trace from Arrow.write() without a keyword argument: > ERROR: TaskFailedException Stacktrace: [1] wait @ ./task.jl:345 [inlined] [2] close(writer::Arrow.Writer{IOStream}) @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:230 [3] open(::Arrow.var"#120#121"{DataFrame}, ::Type, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:file,), Tuple{Bool}}}) @ Base ./io.jl:386 [4] #write#119 @ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:57 [inlined] [5] write(file_path::String, tbl::DataFrame) @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:56 [6] top-level scope @ REPL[14]:1 > nested task error: MethodError: no method matching write(::IOBuffer, ::Decimals.Decimal) Closest candidates are: write(::IO, ::Any) at io.jl:672 write(::IO, ::Any, ::Any...) at io.jl:673 write(::Base.GenericIOBuffer, ::UInt8) at iobuffer.jl:442 ... Stacktrace: [1] write(io::IOBuffer, x::Decimals.Decimal) @ Base ./io.jl:672 [2] writearray(io::IOStream, #unused#::Type{Decimals.Decimal}, col::Vector{Union{Missing, Decimals.Decimal}}) @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/utils.jl:50 [3] writebuffer(io::IOStream, col::Arrow.Primitive{Union{Missing, Decimals.Decimal}, Vector{Union{Missing, Decimals.Decimal}}}, alignment::Int64) @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/arraytypes/primitive.jl:102 [4] write(io::IOStream, msg::Arrow.Message, blocks::Tuple{Vector{Arrow.Block}, Vector{Arrow.Block}}, sch::Base.RefValue{Tables.Schema}, alignment::Int64) @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/write.jl:365 [5] macro expansion @ ~/.julia/packages/Arrow/ZlMFU/src/write.jl:149 [inlined] [6] (::Arrow.var"#122#124"{IOStream, Int64, Tuple{Vector{Arrow.Block}, Vector{Arrow.Block}}, Base.RefValue{Tables.Schema}, Arrow.OrderedChannel{Arrow.Message}})() @ Arrow ./threadingconstructs.jl:258 **Package version** [69666777] Arrow v2.3.0 [a93c6f00] DataFrames v1.3.4 [194296ae] LibPQ v1.14.0 **versioninfo()** (but it was the same on 1.7) Julia Version 1.8.0 Commit 5544a0fab76 (2022-08-17 13:38 UTC) Platform Info: OS: macOS (arm64-apple-darwin21.3.0) CPU: 8 × Apple M1 Pro WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1) Threads: 6 on 6 virtual cores -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-17616) [CI][Java] Java nightly upload job fails after introduction of pruning
Jacob Wujciak-Jens created ARROW-17616: -- Summary: [CI][Java] Java nightly upload job fails after introduction of pruning Key: ARROW-17616 URL: https://issues.apache.org/jira/browse/ARROW-17616 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, Java Reporter: Jacob Wujciak-Jens The nightly java upload job has been failing ever since [ARROW-17293]. https://github.com/apache/arrow/actions/workflows/java_nightly.yml It looks like the "Build Repository" step clashes with the synced repo? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17615) [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package
Raúl Cumplido created ARROW-17615: - Summary: [CI][Packaging] arrow-cpp on conda nightlies fail finding Arrow package Key: ARROW-17615 URL: https://issues.apache.org/jira/browse/ARROW-17615 Project: Apache Arrow Issue Type: Bug Reporter: Raúl Cumplido Assignee: Raúl Cumplido Trying to find arrow package using our current nightly arrow-cpp packange on conda raises the following: {code:java} $ cmake . -DCMAKE_BUILD_TYPE=Release -- The C compiler identification is GNU 10.4.0 -- The CXX compiler identification is GNU 10.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /home/raulcd/miniconda3/envs/cookbook-cpp-dev/bin/x86_64-conda-linux-gnu-c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done CMake Error at CMakeLists.txt:10 (find_package): Found package configuration file: /home/raulcd/miniconda3/envs/cookbook-cpp-dev/lib/cmake/Arrow/ArrowConfig.cmake but it set Arrow_FOUND to FALSE so package "Arrow" is considered to be NOT FOUND. -- Configuring incomplete, errors occurred! See also "/home/raulcd/code/my-cpp-test/CMakeFiles/CMakeOutput.log". {code} The CMakeLists.txt file to reproduce is: {code:java} cmake_minimum_required(VERSION 3.19) project(arrow-test)set(CMAKE_CXX_STANDARD 17) if (CMAKE_CXX_COMPILER_ID STREQUAL "Clang") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -stdlib=libstdc++") endif() # Add Arrow find_package(Arrow REQUIRED COMPONENTS dataset flight parquet) {code} The conda package was created with the following environment: {code:java} name: cookbook-cpp-dev channels: - arrow-nightlies - conda-forge dependencies: - python=3.9 - compilers - arrow-nightlies::arrow-cpp >9 - sphinx - gtest - gmock - arrow-nightlies::pyarrow >9 - clang-tools {code} The compilation is successful when using arrow-cpp 9.0.0 from conda-forge instead of using the arrow-nightlies channel. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17614) [CI][Python] test test_write_dataset_max_rows_per_file is producing several nightly build failures
Raúl Cumplido created ARROW-17614: - Summary: [CI][Python] test test_write_dataset_max_rows_per_file is producing several nightly build failures Key: ARROW-17614 URL: https://issues.apache.org/jira/browse/ARROW-17614 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, Python Reporter: Raúl Cumplido The following failure has been seen on multiple nightly builds: {code:java} _ test_write_dataset_max_rows_per_file _tempdir = PosixPath('/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0') @pytest.mark.parquet def test_write_dataset_max_rows_per_file(tempdir): directory = tempdir / 'ds' max_rows_per_file = 10 max_rows_per_group = 10 num_of_columns = 2 num_of_records = 35 record_batch = _generate_data_and_columns(num_of_columns, num_of_records) ds.write_dataset(record_batch, directory, format="parquet", max_rows_per_file=max_rows_per_file, > > max_rows_per_group=max_rows_per_group)usr/local/lib/python3.7/site-packages/pyarrow/tests/test_dataset.py:3921: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ usr/local/lib/python3.7/site-packages/pyarrow/dataset.py:992: in write_dataset min_rows_per_group, max_rows_per_group, create_dir pyarrow/_dataset.pyx:2811: in pyarrow._dataset._filesystemdataset_write ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E FileNotFoundError: [Errno 2] Failed to open local file '/tmp/pytest-of-root/pytest-0/test_write_dataset_max_rows_pe0/ds/part-1.parquet'. Detail: [errno 2] No such file or directory {code} Example of failed builds: [verify-rc-source-python-macos-conda-amd64|https://github.com/ursacomputing/crossbow/runs/8176702861?check_suite_focus=true] [wheel-manylinux2014-cp37-amd64|https://github.com/ursacomputing/crossbow/runs/8175319639?check_suite_focus=true] It seems flaky as there were some nightly jobs executed on a previous day without new commits that were successful. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17613) [C++] Add function execution API for a preconfigured kernel
Yaron Gvili created ARROW-17613: --- Summary: [C++] Add function execution API for a preconfigured kernel Key: ARROW-17613 URL: https://issues.apache.org/jira/browse/ARROW-17613 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Yaron Gvili Assignee: Yaron Gvili Currently, the function execution API goes through kernel selection on each invocation. This issue will add a faster-path for executing a preconfigured kernel. -- This message was sent by Atlassian Jira (v8.20.10#820010)