[jira] [Created] (ARROW-10203) Capture guidance for endianness support in contributors guide.
Micah Kornfield created ARROW-10203: --- Summary: Capture guidance for endianness support in contributors guide. Key: ARROW-10203 URL: https://issues.apache.org/jira/browse/ARROW-10203 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Micah Kornfield Assignee: Micah Kornfield https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3ccak7z5t--hhhr9dy43pyhd6m-xou4qogwqvlwzsg-koxxjpt...@mail.gmail.com%3e -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10202) [CI][Windows] Use sf.net mirror for MSYS2
Kouhei Sutou created ARROW-10202: Summary: [CI][Windows] Use sf.net mirror for MSYS2 Key: ARROW-10202 URL: https://issues.apache.org/jira/browse/ARROW-10202 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10201) [C++][CI] Disable S3 in arm64 job on Travis CI
Kouhei Sutou created ARROW-10201: Summary: [C++][CI] Disable S3 in arm64 job on Travis CI Key: ARROW-10201 URL: https://issues.apache.org/jira/browse/ARROW-10201 Project: Apache Arrow Issue Type: Improvement Components: C++, Continuous Integration Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10200) [Java][CI] Fix failure of Java CI on s390x
Kazuaki Ishizaki created ARROW-10200: Summary: [Java][CI] Fix failure of Java CI on s390x Key: ARROW-10200 URL: https://issues.apache.org/jira/browse/ARROW-10200 Project: Apache Arrow Issue Type: Bug Affects Versions: 2.0.0 Reporter: Kazuaki Ishizaki ARROW-9701 causes the following failure due to missing a library {{libprotobuf.so.18}}. {code:java} ... [ERROR] PROTOC FAILED: /arrow/java/flight/flight-core/target/protoc-plugins/protoc-3.7.1-linux-s390_64.exe: error while loading shared libraries: libprotobuf.so.18: cannot open shared object file: No such file or directory [ERROR] /arrow/java/flight/flight-core/../../../format/Flight.proto [0:0]: /arrow/java/flight/flight-core/target/protoc-plugins/protoc-3.7.1-linux-s390_64.exe: error while loading shared libraries: libprotobuf.so.18: cannot open shared object file: No such file or directory ... {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10199) [Rust][Parquet] Release Parquet at crates.io to remove debug prints
Krzysztof Stanisławek created ARROW-10199: - Summary: [Rust][Parquet] Release Parquet at crates.io to remove debug prints Key: ARROW-10199 URL: https://issues.apache.org/jira/browse/ARROW-10199 Project: Apache Arrow Issue Type: Wish Components: Rust Affects Versions: 1.0.1 Reporter: Krzysztof Stanisławek Version of Parquet released to docs.rs & crates.io has debug prints in [https://docs.rs/crate/parquet/1.0.1/source/src/column/writer.rs] (line 30). They were pretty hard to track down, so I suggest considering logging create in the future. When is the new version going to be released? Is there some stable schedule I can expect? Is it recommended to use the current snapshot straight from github instead of crates.io? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10198) [Dev] Python merge script doesn't close PRs if not merged on master
Neville Dipale created ARROW-10198: -- Summary: [Dev] Python merge script doesn't close PRs if not merged on master Key: ARROW-10198 URL: https://issues.apache.org/jira/browse/ARROW-10198 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Affects Versions: 1.0.1 Reporter: Neville Dipale When using the merge script to merge PRs against non-master branches, the PR on Github doesn't get closed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10197) [Gandiva][python] Execute expression on filtered data
Kirill Lykov created ARROW-10197: Summary: [Gandiva][python] Execute expression on filtered data Key: ARROW-10197 URL: https://issues.apache.org/jira/browse/ARROW-10197 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva, Python Reporter: Kirill Lykov Looks like there is no way to execute an expression on filtered data in python. Basically, I cannot pass `SelectionVector` to projector's `evaluate` method ```python import pyarrow as pa import pyarrow.gandiva as gandiva table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]), pa.array([5., 45., 36., 73., 83., 23., 76.])], ['a', 'b']) builder = gandiva.TreeExprBuilder() node_a = builder.make_field(table.schema.field("a")) node_b = builder.make_field(table.schema.field("b")) fifty = builder.make_literal(50.0, pa.float64()) eleven = builder.make_literal(11.0, pa.float64()) cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_()) cond_2 = builder.make_function("greater_than", [node_a, node_b], pa.bool_()) cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_()) cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3]) condition = builder.make_condition(cond) filter = gandiva.make_filter(table.schema, condition) # filterResult has type SelectionVector filterResult = filter.evaluate(table.to_batches()[0], pa.default_memory_pool()) print(result) sum = builder.make_function("add", [node_a, node_b], pa.float64()) field_result = pa.field("c", pa.float64()) expr = builder.make_expression(sum, field_result) projector = gandiva.make_projector( table.schema, [expr], pa.default_memory_pool()) ### Here there is a problem that I don't know how to use filterResult with projector r, = projector.evaluate(table.to_batches()[0], result) ``` In C++, I see that it is possible to pass SelectionVector as second argument to projector::Evaluate: [https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270] Meanwhile, it looks like it is impossible in `gandiva.pyx`: [https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10196) [C++] Add Future::DeferNotOk()
Ben Kietzman created ARROW-10196: Summary: [C++] Add Future::DeferNotOk() Key: ARROW-10196 URL: https://issues.apache.org/jira/browse/ARROW-10196 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 1.0.1 Reporter: Ben Kietzman Assignee: Ben Kietzman Fix For: 2.0.0 Provide a static method mapping Result> -> Future. If the Result is an error, a finished future containing its Status will be constructed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10195) [C++] Add string struct extract kernel using re2
Maarten Breddels created ARROW-10195: Summary: [C++] Add string struct extract kernel using re2 Key: ARROW-10195 URL: https://issues.apache.org/jira/browse/ARROW-10195 Project: Apache Arrow Issue Type: New Feature Reporter: Maarten Breddels Assignee: Maarten Breddels Similar to Pandas' str.extract a way to convert a string to a struct of strings using the re2 regex library (when having named captured groups). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10194) [Python] Array.to_numpy() with type fixed_size_list(int64(), 1) doesn't roundtrip for large integer values
Krisztian Szucs created ARROW-10194: --- Summary: [Python] Array.to_numpy() with type fixed_size_list(int64(), 1) doesn't roundtrip for large integer values Key: ARROW-10194 URL: https://issues.apache.org/jira/browse/ARROW-10194 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Krisztian Szucs Reproducer: {code:python} data = [None, [9007199254740993]] arr = pa.array(data, type=pa.list_(pa.uint64(), 1)) ndarray = arr.to_numpy(zero_copy_only=False) restored = pa.array(ndarray, type=arr.type) assert restored.equals(arr) {code} Error: {code} E assert False E+ where False = (\n[\n null,\n [\n90071992547409944\n ]\n]) E+where = \n[\n null,\n [\n90071992547409952\n ]\n].equals {code} The inner numpy array ({{ndarray[1]}}) has float64 dtype where the integer gets truncated because of the precision. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10193) [Python] Segfault when converting to fixed size binary array
Krisztian Szucs created ARROW-10193: --- Summary: [Python] Segfault when converting to fixed size binary array Key: ARROW-10193 URL: https://issues.apache.org/jira/browse/ARROW-10193 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Krisztian Szucs Assignee: Krisztian Szucs Fix For: 2.0.0 Reproducer: {code:python} data = [b'\x19h\r\x9e\x00\x00\x00\x00\x01\x9b\x9fA'] assert len(data[0]) == 12 ty = pa.binary(12) arr = pa.array(data, type=ty) {code} Trace: {code} pyarrow/tests/test_convert_builtin.py::test_fixed_size_binary_length_check ../src/arrow/array/builder_binary.cc:53: Check failed: (size) == (byte_width_) Appending wrong size to FixedSizeBinaryBuilder 0 libarrow.200.0.0.dylib 0x00010e7f9704 _ZN5arrow4util7CerrLog14PrintBackTraceEv + 52 1 libarrow.200.0.0.dylib 0x00010e7f9622 _ZN5arrow4util7CerrLogD2Ev + 98 2 libarrow.200.0.0.dylib 0x00010e7f9585 _ZN5arrow4util7CerrLogD1Ev + 21 3 libarrow.200.0.0.dylib 0x00010e7f95ac _ZN5arrow4util7CerrLogD0Ev + 28 4 libarrow.200.0.0.dylib 0x00010e7f9492 _ZN5arrow4util8ArrowLogD2Ev + 82 5 libarrow.200.0.0.dylib 0x00010e7f94c5 _ZN5arrow4util8ArrowLogD1Ev + 21 6 libarrow.200.0.0.dylib 0x00010e303ec1 _ZN5arrow22FixedSizeBinaryBuilder14CheckValueSizeEx + 209 7 libarrow.200.0.0.dylib 0x00010e30c361 _ZN5arrow22FixedSizeBinaryBuilder12UnsafeAppendEN6nonstd7sv_lite17basic_string_viewIcNSt3__111char_traitsIc + 49 8 libarrow_python.200.0.0.dylib 0x00010b4efa7d _ZN5arrow2py20PyPrimitiveConverterINS_19FixedSizeBinaryTypeEvE6AppendEP7_object + 813 {code} The input {{const char*}} value gets implicitly casted to string_view which makes the length check fail in debug builds. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10192) [C++][Python] Segfault when converting nested struct array with dictionary field to pandas series
Krisztian Szucs created ARROW-10192: --- Summary: [C++][Python] Segfault when converting nested struct array with dictionary field to pandas series Key: ARROW-10192 URL: https://issues.apache.org/jira/browse/ARROW-10192 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Krisztian Szucs Reproducer: {code:python} def test_struct_array_with_dictionary_field_to_pandas(): ty = pa.struct([ pa.field('dict', pa.dictionary(pa.int64(), pa.int32())), ]) data = [ {'dict': -1859762450} ] arr = pa.array(data, type=ty) arr.to_pandas() {code} Raises SIGSTOP: {code} * thread #1, stop reason = signal SIGSTOP * frame #0: 0x7fff6e2b733a libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x7fff6e373e60 libsystem_pthread.dylib`pthread_kill + 430 frame #2: 0x7fff6e1ce93e libsystem_c.dylib`raise + 26 frame #3: 0x7fff6e3685fd libsystem_platform.dylib`_sigtramp + 29 frame #4: 0x00011517adfd libarrow_python.200.0.0.dylib`arrow::py::ConvertStruct(options=0x7f84fc5a0230, data=0x7f84fc59ef18, out_values=0x7f84fc53d140) at arrow_to_pandas.cc:685:54 frame #5: 0x00011514c642 libarrow_python.200.0.0.dylib`arrow::py::ObjectWriterVisitor::Visit(this=0x7ffee06a1a88, type=0x7f84fc5a00e8) at arrow_to_pandas.cc:1031:12 frame #6: 0x0001151499c4 libarrow_python.200.0.0.dylib`arrow::Status arrow::VisitTypeInline(type=0x7f84fc5a00e8, visitor=0x7ffee06a1a88) at visitor_inline.h:88:5 frame #7: 0x000115149305 libarrow_python.200.0.0.dylib`arrow::py::ObjectWriter::CopyInto(this=0x7f84fc5a0228, data=std::__1::shared_ptr::element_type @ 0x7f84fc59ef18 strong=2 weak=1, rel_placement=0) at arrow_to_pand as.cc:1055:12 {code} {code:cpp} frame #4: 0x00011517adfd libarrow_python.200.0.0.dylib`arrow::py::ConvertStruct(options=0x7f84fc5a0230, data=0x7f84fc59ef18, out_values=0x7f84fc53d140) at arrow_to_pandas.cc:685:54 682if (!arr->field(static_cast(field_idx))->IsNull(i)) { 683 // Value exists in child array, obtain it 684 auto array = reinterpret_cast(fields_data[field_idx].obj()); -> 685 auto ptr = reinterpret_cast(PyArray_GETPTR1(array, i)); 686 field_value.reset(PyArray_GETITEM(array, ptr)); 687 RETURN_IF_PYERROR(); 688} else { {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10191) [Rust] [Parquet] Add roundtrip tests for single column batches
Neville Dipale created ARROW-10191: -- Summary: [Rust] [Parquet] Add roundtrip tests for single column batches Key: ARROW-10191 URL: https://issues.apache.org/jira/browse/ARROW-10191 Project: Apache Arrow Issue Type: Sub-task Components: Rust Affects Versions: 1.0.1 Reporter: Neville Dipale To aid with test coverage and picking up information loss during Parquet and Arrow roundtrips, we can add tests that assert that all supported Arrow datatypes can be written and read correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10190) [Website] Add Jorge to list of committers
Jorge Leitão created ARROW-10190: Summary: [Website] Add Jorge to list of committers Key: ARROW-10190 URL: https://issues.apache.org/jira/browse/ARROW-10190 Project: Apache Arrow Issue Type: Improvement Reporter: Jorge Leitão Assignee: Jorge Leitão -- This message was sent by Atlassian Jira (v8.3.4#803005)