[jira] [Created] (ARROW-11663) [DataFusion] Master does not compile
Jorge Leitão created ARROW-11663: Summary: [DataFusion] Master does not compile Key: ARROW-11663 URL: https://issues.apache.org/jira/browse/ARROW-11663 Project: Apache Arrow Issue Type: Bug Components: Rust - DataFusion Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11662) [C++] Support sorting for decimal data type.
ZMZ91 created ARROW-11662: - Summary: [C++] Support sorting for decimal data type. Key: ARROW-11662 URL: https://issues.apache.org/jira/browse/ARROW-11662 Project: Apache Arrow Issue Type: Wish Components: C++ Reporter: ZMZ91 Seems sorting on decimal datum is not implemented yet til 3.0.0. Is there special reason not support it? Is the implementation on roadmap? Any workaround to sort on decimal datum so far? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11661) [C++] Compilation failure in arrow/scalar.cc on Xcode 8.3.3
Wes McKinney created ARROW-11661: Summary: [C++] Compilation failure in arrow/scalar.cc on Xcode 8.3.3 Key: ARROW-11661 URL: https://issues.apache.org/jira/browse/ARROW-11661 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney See https://gist.github.com/wesm/e3b52381de1556f2af669c7e2458afd0 It seems that this template construct is not supported so robustly across older compilers: {code} // timestamp to string Status CastImpl(const TimestampScalar& from, StringScalar* to) { to->value = FormatToBuffer(internal::StringFormatter{}, from); return Status::OK(); } // date to string template Status CastImpl(const DateScalar& from, StringScalar* to) { TimestampScalar ts({}, timestamp(TimeUnit::MILLI)); RETURN_NOT_OK(CastImpl(from, )); return CastImpl(ts, to); } // string to any template Status CastImpl(const StringScalar& from, ScalarType* to) { ARROW_ASSIGN_OR_RAISE(auto out, Scalar::Parse(to->type, util::string_view(*from.value))); to->value = std::move(checked_cast(*out).value); return Status::OK(); } // binary to string Status CastImpl(const BinaryScalar& from, StringScalar* to) { to->value = from.value; return Status::OK(); } // formattable to string template , // note: Value unused but necessary to trigger SFINAE if Formatter is // undefined typename Value = typename Formatter::value_type> Status CastImpl(const ScalarType& from, StringScalar* to) { to->value = FormatToBuffer(Formatter{from.type}, from); return Status::OK(); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11660) [C++] Move RecordBatch::SelectColumns method from R to C++ library
Neal Richardson created ARROW-11660: --- Summary: [C++] Move RecordBatch::SelectColumns method from R to C++ library Key: ARROW-11660 URL: https://issues.apache.org/jira/browse/ARROW-11660 Project: Apache Arrow Issue Type: New Feature Components: C++, R Reporter: Neal Richardson Fix For: 4.0.0 Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11659) [R] Preserve group_by .drop argument
Neal Richardson created ARROW-11659: --- Summary: [R] Preserve group_by .drop argument Key: ARROW-11659 URL: https://issues.apache.org/jira/browse/ARROW-11659 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Neal Richardson Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11658) [R] Handle mutate/rename inside group_by
Neal Richardson created ARROW-11658: --- Summary: [R] Handle mutate/rename inside group_by Key: ARROW-11658 URL: https://issues.apache.org/jira/browse/ARROW-11658 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Neal Richardson Fix For: 4.0.0 Followup to ARROW-11657 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11657) [R] group_by with .drop specified errors
Neal Richardson created ARROW-11657: --- Summary: [R] group_by with .drop specified errors Key: ARROW-11657 URL: https://issues.apache.org/jira/browse/ARROW-11657 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 4.0.0 cf. https://github.com/tidyverse/dplyr/issues/5763 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11655) Pad/trim functions
Mike Seddon created ARROW-11655: --- Summary: Pad/trim functions Key: ARROW-11655 URL: https://issues.apache.org/jira/browse/ARROW-11655 Project: Apache Arrow Issue Type: Sub-task Reporter: Mike Seddon Assignee: Mike Seddon The Pad and Trimming functions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11656) Left over functions/fixes
Mike Seddon created ARROW-11656: --- Summary: Left over functions/fixes Key: ARROW-11656 URL: https://issues.apache.org/jira/browse/ARROW-11656 Project: Apache Arrow Issue Type: Sub-task Reporter: Mike Seddon Assignee: Mike Seddon -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11654) Regex functions
Mike Seddon created ARROW-11654: --- Summary: Regex functions Key: ARROW-11654 URL: https://issues.apache.org/jira/browse/ARROW-11654 Project: Apache Arrow Issue Type: Sub-task Reporter: Mike Seddon Assignee: Mike Seddon The regexp Postgres functions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11653) Ascii/unicode functions
Mike Seddon created ARROW-11653: --- Summary: Ascii/unicode functions Key: ARROW-11653 URL: https://issues.apache.org/jira/browse/ARROW-11653 Project: Apache Arrow Issue Type: Sub-task Reporter: Mike Seddon Implement the Postgres Ascii/Unicode functions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11652) Signature::OneOf
Mike Seddon created ARROW-11652: --- Summary: Signature::OneOf Key: ARROW-11652 URL: https://issues.apache.org/jira/browse/ARROW-11652 Project: Apache Arrow Issue Type: Sub-task Reporter: Mike Seddon Assignee: Mike Seddon There needs to be a way of defining a function signature that supports multiple strict options: e.g. `lpad` [string, int] or [string, int, string] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11651) Postgres Length Functions
Mike Seddon created ARROW-11651: --- Summary: Postgres Length Functions Key: ARROW-11651 URL: https://issues.apache.org/jira/browse/ARROW-11651 Project: Apache Arrow Issue Type: Sub-task Reporter: Mike Seddon Assignee: Mike Seddon To break up the large PR this is just the Postgres length functions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11650) [Rust][DataFusion] Add Postgres License
Mike Seddon created ARROW-11650: --- Summary: [Rust][DataFusion] Add Postgres License Key: ARROW-11650 URL: https://issues.apache.org/jira/browse/ARROW-11650 Project: Apache Arrow Issue Type: Improvement Components: Rust - DataFusion Reporter: Mike Seddon Assignee: Mike Seddon DataFusion aims to support the PostgreSQL compatibility. To achieve compatibility parts of the DataFusion code base may have reproduced code and documentation from the PostgreSQL project and needs the license to reflect this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11649) Add support for null_fallback to R
Weston Pace created ARROW-11649: --- Summary: Add support for null_fallback to R Key: ARROW-11649 URL: https://issues.apache.org/jira/browse/ARROW-11649 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Weston Pace Assignee: Weston Pace ARROW-10438 made is so that a "null" in a partition column will be mapped to a special directory with hive partitioning. By default this is __HIVE_DEFAULT_PARTITION__ but it is configurable by other hive manipulation tools (e.g. spark) and so needs to be configurable in Arrow as well. The `null_fallback` option is a string that can be passed to {color:#00}HivePartitioning::HivePartitioning and HivePartitioning::MakeFactory This option should be exposed in R.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11648) [Rust]: Json reader for nested dictionary arrays returns empty array instead of array of nulls
Jörn Horstmann created ARROW-11648: -- Summary: [Rust]: Json reader for nested dictionary arrays returns empty array instead of array of nulls Key: ARROW-11648 URL: https://issues.apache.org/jira/browse/ARROW-11648 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Jörn Horstmann {code:java} // schema: id: Utf8, attr: List> let input = json!([ {"id": "a"}, {"id": "b"}, {"id": "c"}, {"id": "d"}, {"id": "e"}, ]); // Results in ArrowError("InvalidArgumentError(\"all columns in a record batch must have the same length\")"){code} Probably related to `list_array_string_array_builder` around line 688: {code:java} if let Some(value) = row.get(col_name) { ... } // no else{code} Expected: The resulting array should have a length equal to the number of rows, all nested lists should be marked as null. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11647) [C++][Compute] CastFromNull does not use preallocated buffers
Ben Kietzman created ARROW-11647: Summary: [C++][Compute] CastFromNull does not use preallocated buffers Key: ARROW-11647 URL: https://issues.apache.org/jira/browse/ARROW-11647 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Ben Kietzman Assignee: Ben Kietzman Fix For: 4.0.0 When casting from null, currently new buffers are allocated for every batch of the computation. This is wasteful as for simple types data buffers are preallocated and the null bitmap is handled separately; CastFromNull need do no work at all (unless we decide to explicitly zero the data buffer). For varlength out types the offsets buffer is preallocated and should be zeroed, for struct types preallocation is not implemented (but should be as simple as preallocating each child array). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11646) [Python] Allow to create BooleanArray from a list of 0 and 1
quentin lhoest created ARROW-11646: -- Summary: [Python] Allow to create BooleanArray from a list of 0 and 1 Key: ARROW-11646 URL: https://issues.apache.org/jira/browse/ARROW-11646 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 3.0.0 Reporter: quentin lhoest Currently one can create a BooleanArray using these ways: {code:python} pa.array([False, True, True, False]) pa.array(np.array([0, 1, 1, 0]), type=pa.bool_()) pa.array([0, 1, 1, 0]).cast(pa.bool_()) {code} But creating it this way fails: {code:python} pa.array([0, 1, 1, 0], type=pa.bool_()) {code} --- ArrowInvalid Traceback (most recent call last) in > 1 a = pa.array([0, 1, 1, 0], type=pa.bool_()) ~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib.array() ~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/array.pxi in pyarrow.lib._sequence_to_array() ~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status() ~/.virtualenvs/hf-datasets/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status() ArrowInvalid: Could not convert 0 with type int: tried to convert to boolean {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11645) [Rust] Interval is out of spec
Jorge Leitão created ARROW-11645: Summary: [Rust] Interval is out of spec Key: ARROW-11645 URL: https://issues.apache.org/jira/browse/ARROW-11645 Project: Apache Arrow Issue Type: Bug Components: Rust Reporter: Jorge Leitão The DataType interval is currently implemented as a i32 or i64. However, that is incorrect and does not follow the spec. See https://github.com/apache/arrow/blob/master/format/Schema.fbs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11644) [Python][Parquet] Low-level Python API for Parquet decryption
Itamar Turner-Trauring created ARROW-11644: -- Summary: [Python][Parquet] Low-level Python API for Parquet decryption Key: ARROW-11644 URL: https://issues.apache.org/jira/browse/ARROW-11644 Project: Apache Arrow Issue Type: Sub-task Reporter: Itamar Turner-Trauring Assignee: Itamar Turner-Trauring This will cover decryption only, using the low-level crypto API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11643) [C++] protobuf_ep failure on Xcode 8.3.3 / Apple LLVM 8.1
Wes McKinney created ARROW-11643: Summary: [C++] protobuf_ep failure on Xcode 8.3.3 / Apple LLVM 8.1 Key: ARROW-11643 URL: https://issues.apache.org/jira/browse/ARROW-11643 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Wes McKinney I randomly decided to see if we can still build and run on a pre-SSE4.2 machine (2009-era MacBook), but protobuf_ep fails with {code} FAILED: CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o /Applications/Xcode8.3.3.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DGOOGLE_PROTOBUF_CMAKE_BUILD -DHAVE_PTHREAD -DHAVE_ZLIB -I. -I/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src -Qunused-arguments -fcolor-diagnostics -O3 -DNDEBUG -O3 -DNDEBUG -fPIC -Qunused-arguments -fcolor-diagnostics -O3 -DNDEBUG -O3 -DNDEBUG -fPIC -std=c++11 -MD -MT CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o -MF CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o.d -o CMakeFiles/libprotobuf.dir/Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc.o -c /Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc In file included from /Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/dynamic_message.cc:80: /Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/map_field.h:332:37: error: constexpr constructor never produces a constant expression [-Winvalid-constexpr] explicit PROTOBUF_MAYBE_CONSTEXPR MapFieldBase(ConstantInitialized) ^ /Users/wesm/code/arrow/cpp/build/protobuf_ep-prefix/src/protobuf_ep/src/google/protobuf/map_field.h:335:9: note: non-literal type 'internal::WrappedMutex' cannot be used in a constant expression mutex_(GOOGLE_PROTOBUF_LINKER_INITIALIZED), ^ 1 error generated. {code} Since this appears to be a warning, perhaps it can be suppressed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11642) [C++] Incorrect preprocessor directive for Windows
Markus Silberstein Hont created ARROW-11642: --- Summary: [C++] Incorrect preprocessor directive for Windows Key: ARROW-11642 URL: https://issues.apache.org/jira/browse/ARROW-11642 Project: Apache Arrow Issue Type: Bug Reporter: Markus Silberstein Hont In the HDFS connector library, there are per-platform preprocessor directives to determine which libjvm and libhdfs binaries to use for file operations. This is currently (3.0.0) failing due to an incorrect preprocessor directive for Windows. The code is referring to "__WIN32" while the properly defined variable should be "_WIN32" (see https://docs.microsoft.com/en-us/cpp/preprocessor/predefined-macros?view=msvc-160) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11641) [CI]: Use docker buildkit's inline cache to reuse build cache across different hosts
Krisztian Szucs created ARROW-11641: --- Summary: [CI]: Use docker buildkit's inline cache to reuse build cache across different hosts Key: ARROW-11641 URL: https://issues.apache.org/jira/browse/ARROW-11641 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Krisztian Szucs Assignee: Krisztian Szucs -- This message was sent by Atlassian Jira (v8.3.4#803005)