[jira] [Created] (ARROW-18047) [Dev][Archery][Crossbow] Queue.put() should use Job.queue setter
Kouhei Sutou created ARROW-18047: Summary: [Dev][Archery][Crossbow] Queue.put() should use Job.queue setter Key: ARROW-18047 URL: https://issues.apache.org/jira/browse/ARROW-18047 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou This is related to ARROW-18028. Comment bot reports the following error with ARROW-18028: https://github.com/apache/arrow/pull/14409#issuecomment-1278351434 {noformat} 'NoneType' object has no attribute 'github_commit' The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3246777470 {noformat} https://github.com/apache/arrow/actions/runs/3246777470 {noformat} ERROR:archery:'NoneType' object has no attribute 'github_commit' Traceback (most recent call last): File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 153, in handle_issue_comment self.handler(command, issue=issue, pull_request=pull, File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 56, in __call__ return self.invoke(ctx) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/decorators.py", line 38, in new_func return f(get_current_context().obj, *args, **kwargs) File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 276, in submit pull_request.create_issue_comment(report.show()) File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", line 333, in show url=self.task_url(task) File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", line 69, in task_url if task.status().build_links: File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/core.py", line 869, in status github_commit = self._queue.github_commit(self.commit) AttributeError: 'NoneType' object has no attribute 'github_commit' {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18046) [Dev][Archery][Crossbow] Queue.put() should use Job.queue setter
Kouhei Sutou created ARROW-18046: Summary: [Dev][Archery][Crossbow] Queue.put() should use Job.queue setter Key: ARROW-18046 URL: https://issues.apache.org/jira/browse/ARROW-18046 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou This is related to ARROW-18028. Comment bot reports the following error with ARROW-18028: https://github.com/apache/arrow/pull/14409#issuecomment-1278351434 {noformat} 'NoneType' object has no attribute 'github_commit' The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/3246777470 {noformat} https://github.com/apache/arrow/actions/runs/3246777470 {noformat} ERROR:archery:'NoneType' object has no attribute 'github_commit' Traceback (most recent call last): File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 153, in handle_issue_comment self.handler(command, issue=issue, pull_request=pull, File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 56, in __call__ return self.invoke(ctx) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/decorators.py", line 38, in new_func return f(get_current_context().obj, *args, **kwargs) File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 276, in submit pull_request.create_issue_comment(report.show()) File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", line 333, in show url=self.task_url(task) File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", line 69, in task_url if task.status().build_links: File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/core.py", line 869, in status github_commit = self._queue.github_commit(self.commit) AttributeError: 'NoneType' object has no attribute 'github_commit' {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18045) Cannot install on Ubuntu 20.04
Joshua Wang created ARROW-18045: --- Summary: Cannot install on Ubuntu 20.04 Key: ARROW-18045 URL: https://issues.apache.org/jira/browse/ARROW-18045 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 9.0.0, 6.0.1 Reporter: Joshua Wang Attachments: arrow_install_logs.txt, pip_install_error.txt I'm trying to install {{pyarrow}} version {{6.0.1}} on a Raspberry Pi running Ubuntu 20.04, but it fails with an error {{Could NOT find Arrow (missing: Arrow_DIR)?? (full error log for pip install attached below). I tried running through the ubuntu install steps [here/|https://arrow.apache.org/install/] It errors out when trying to install {{libarrow-dev}}. I've attached the full output below as well. Can someone please let me know what I'm doing wrong? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18044) [Java] upgrade error-prone library to 2.16.0
Larry White created ARROW-18044: --- Summary: [Java] upgrade error-prone library to 2.16.0 Key: ARROW-18044 URL: https://issues.apache.org/jira/browse/ARROW-18044 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Larry White Current version of errorprone interacts badly with Intellij, leading to erroneous (ironically ) reporting of an error for using "non-standard ascii characters". This causes intermittent but frequent failures of arbitrary tests and is thus crazy-making. See Errorprone issue https://github.com/google/error-prone/issues/3092 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [arrow-testing] pitrou merged pull request #81: ARROW-18031: [C++][Parquet] Undefined behavior in boolean RLE decoder
pitrou merged PR #81: URL: https://github.com/apache/arrow-testing/pull/81 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] zeroshade opened a new pull request, #81: ARROW-18031: [C++][Parquet] Undefined behavior in boolean RLE decoder
zeroshade opened a new pull request, #81: URL: https://github.com/apache/arrow-testing/pull/81 Corresponding Fix for this issue found in https://github.com/apache/arrow/pull/14407 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-18043) [R] Properly instantiate empty arrays of extension types in Table__from_schema
Nicola Crane created ARROW-18043: Summary: [R] Properly instantiate empty arrays of extension types in Table__from_schema Key: ARROW-18043 URL: https://issues.apache.org/jira/browse/ARROW-18043 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Nicola Crane The PR for ARROW-12105 introduces the function Table__from_schema which creates an empty Table from a Schema object. Currently it can't handle extension types, and instead just returns NULL type objects. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18042) [Java] Distribute Apple M1 compatible JNI libraries via mavencentral
Rok Mihevc created ARROW-18042: -- Summary: [Java] Distribute Apple M1 compatible JNI libraries via mavencentral Key: ARROW-18042 URL: https://issues.apache.org/jira/browse/ARROW-18042 Project: Apache Arrow Issue Type: New Feature Components: Java Affects Versions: 9.0.0 Reporter: Rok Mihevc Currently JNI libraries need to be built locally to be usable on Apple silicon. We should build and distribute compatible libraries via mavencentral. @dsusanibara @lidavidm Also see ARROW-17267 and ARROW-16608 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18041) [Python] Sustrait-related test failure in wheel tests
Antoine Pitrou created ARROW-18041: -- Summary: [Python] Sustrait-related test failure in wheel tests Key: ARROW-18041 URL: https://issues.apache.org/jira/browse/ARROW-18041 Project: Apache Arrow Issue Type: Bug Components: C++, Packaging, Python Reporter: Antoine Pitrou Fix For: 10.0.0 See https://github.com/ursacomputing/crossbow/actions/runs/3240936478/jobs/5312200303#step:7:341 {code} 2022-10-13T09:42:51.5203618Z __ test_run_serialized_query __ 2022-10-13T09:42:51.5203890Z 2022-10-13T09:42:51.5204391Z tmpdir = local('C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\pytest-of-ContainerAdministrator\\pytest-0\\test_run_serialized_query0') 2022-10-13T09:42:51.5205282Z 2022-10-13T09:42:51.5205769Z def test_run_serialized_query(tmpdir): 2022-10-13T09:42:51.5206172Z substrait_query = """ 2022-10-13T09:42:51.5206505Z { 2022-10-13T09:42:51.5206828Z "relations": [ 2022-10-13T09:42:51.5207175Z {"rel": { 2022-10-13T09:42:51.5207501Z "read": { 2022-10-13T09:42:51.5207800Z "base_schema": { 2022-10-13T09:42:51.5208155Z "struct": { 2022-10-13T09:42:51.5208491Z "types": [ 2022-10-13T09:42:51.5208841Z {"i64": {}} 2022-10-13T09:42:51.5209182Z ] 2022-10-13T09:42:51.5209501Z }, 2022-10-13T09:42:51.5209829Z "names": [ 2022-10-13T09:42:51.5210168Z "foo" 2022-10-13T09:42:51.5210611Z ] 2022-10-13T09:42:51.5211097Z }, 2022-10-13T09:42:51.5211453Z "local_files": { 2022-10-13T09:42:51.5211747Z "items": [ 2022-10-13T09:42:51.5212083Z { 2022-10-13T09:42:51.5212530Z "uri_file": "file://FILENAME_PLACEHOLDER", 2022-10-13T09:42:51.5212930Z "arrow": {} 2022-10-13T09:42:51.5213261Z } 2022-10-13T09:42:51.5213579Z ] 2022-10-13T09:42:51.5213885Z } 2022-10-13T09:42:51.5214188Z } 2022-10-13T09:42:51.5214491Z }} 2022-10-13T09:42:51.5214795Z ] 2022-10-13T09:42:51.5215053Z } 2022-10-13T09:42:51.5215399Z """ 2022-10-13T09:42:51.5215708Z 2022-10-13T09:42:51.5355345Z file_name = "read_data.arrow" 2022-10-13T09:42:51.5356563Z table = pa.table([[1, 2, 3, 4, 5]], names=['foo']) 2022-10-13T09:42:51.5360922Z path = _write_dummy_data_to_disk(tmpdir, file_name, table) 2022-10-13T09:42:51.5361743Z query = tobytes(substrait_query.replace("FILENAME_PLACEHOLDER", path)) 2022-10-13T09:42:51.5362170Z 2022-10-13T09:42:51.5362589Z buf = pa._substrait._parse_json_plan(query) 2022-10-13T09:42:51.5362990Z 2022-10-13T09:42:51.5363388Z > reader = substrait.run_query(buf) 2022-10-13T09:42:51.5363692Z 2022-10-13T09:42:51.5364018Z Python\lib\site-packages\pyarrow\tests\test_substrait.py:79: 2022-10-13T09:42:51.5364520Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2022-10-13T09:42:51.5365008Z pyarrow\_substrait.pyx:146: in pyarrow._substrait.run_query 2022-10-13T09:42:51.5365444Z ??? 2022-10-13T09:42:51.5365903Z pyarrow\error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status 2022-10-13T09:42:51.5366352Z ??? 2022-10-13T09:42:51.5366746Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 2022-10-13T09:42:51.5367047Z 2022-10-13T09:42:51.5367246Z > ??? 2022-10-13T09:42:51.5376405Z E pyarrow.lib.ArrowInvalid: Cannot parse URI: 'file://C:UsersContainerAdministratorAppDataLocalTemppytest-of-ContainerAdministratorpytest-0 est_run_serialized_query0 2022-10-13T09:42:51.5377196Z ead_data.arrow' 2022-10-13T09:42:51.5377363Z 2022-10-13T09:42:51.5377519Z pyarrow\error.pxi:100: ArrowInvalid 2022-10-13T09:42:51.5377857Z __ test_binary_conversion_with_json_options ___ 2022-10-13T09:42:51.5378087Z 2022-10-13T09:42:51.5378488Z tmpdir = local('C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\pytest-of-ContainerAdministrator\\pytest-0\\test_binary_conversion_with_js0') 2022-10-13T09:42:51.5378905Z 2022-10-13T09:42:51.5379091Z def test_binary_conversion_with_json_options(tmpdir): 2022-10-13T09:42:51.5379432Z substrait_query = """ 2022-10-13T09:42:51.5379695Z { 2022-10-13T09:42:51.5379951Z "relations": [ 2022-10-13T09:42:51.5380229Z {"rel": { 2022-10-13T09:42:51.5380492Z "read": { 2022-10-13T09:42:51.5380954Z "base_schema": { 2022-10-13T09:42:51.5381237Z "struct": { 2022-10-13T09:42:51.5381473Z
[jira] [Created] (ARROW-18040) [Plasma] Remove Plasma
Antoine Pitrou created ARROW-18040: -- Summary: [Plasma] Remove Plasma Key: ARROW-18040 URL: https://issues.apache.org/jira/browse/ARROW-18040 Project: Apache Arrow Issue Type: Task Components: C++ - Plasma, Documentation, GLib, Java, Python, Ruby Reporter: Antoine Pitrou Plasma was deprecated in ARROW-17860. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18039) [C++][CI] Reduce MinGW build times
Antoine Pitrou created ARROW-18039: -- Summary: [C++][CI] Reduce MinGW build times Key: ARROW-18039 URL: https://issues.apache.org/jira/browse/ARROW-18039 Project: Apache Arrow Issue Type: Wish Components: C++, Continuous Integration Reporter: Antoine Pitrou The MinGW C++ builds on CI currently build in release mode. This is probably because debug builds on Windows are complicated (you must get all the dependencies also compiled in debug mode, AFAIU). However, we could probably disable optimizations, so as to reduce compilation times. The compilation flags are currently as follows: {code} -- CMAKE_C_FLAGS: -O2 -DNDEBUG -ftree-vectorize -Wa,-mbig-obj -Wall -Wno-conversion -Wno-sign-conversion -Wunused-result -fno-semantic-interposition -mxsave -msse4.2 -- CMAKE_CXX_FLAGS: -Wno-noexcept-type -fdiagnostics-color=always -O2 -DNDEBUG -ftree-vectorize -Wa,-mbig-obj -Wall -Wno-conversion -Wno-sign-conversion -Wunused-result -fno-semantic-interposition -mxsave -msse4.2 {code} Perhaps we can pass {{-O0}}? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18038) [Archery][CI] Refactor git dependencies used on archery to be more consistent
Raúl Cumplido created ARROW-18038: - Summary: [Archery][CI] Refactor git dependencies used on archery to be more consistent Key: ARROW-18038 URL: https://issues.apache.org/jira/browse/ARROW-18038 Project: Apache Arrow Issue Type: Improvement Components: Archery Reporter: Raúl Cumplido Currently archery has the following git related dependencies: {code:java} 'release': ['gitpython'] 'crossbow': ['github3.py', 'pygit2>=1.6.0'] 'crossbow-upload': ['github3.py'] 'bot': ['github3.py', 'pygit2>=1.6.0', 'pygithub']{code} that makes difficult to work with archery git related code and makes more difficult code reuse. As an example the comment on this PR: [https://github.com/apache/arrow/pull/14033#discussion_r993778812] {code:java} While dev/archery/archery/crossbow/core.py uses pygit2, dev/archery/archery/release/core.py uses GitPython. The Repo class that is used in each module are also not shared. {code} We should refactor archery to not require 2 different github libraries (github3 and pygithub) and 2 different git ones (pygit and gitpython). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18037) [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns
Antoine Pitrou created ARROW-18037: -- Summary: [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns Key: ARROW-18037 URL: https://issues.apache.org/jira/browse/ARROW-18037 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou As found while working on ARROW-18004: the dataset scanner and the Acero engine rely on {{ExecBatch::ToRecordBatch}} returning successfully when the given schema has fewer fields than the ExecBatch has columns. This apparently allows to implicitly drop the dataset-added columns ({{kAugmentedFields}} in {{arrow/dataset/scanner.cc}}) from a scan's final result. However, it seems wrong and brittle to do this implicitly at the {{ExecBatch::ToRecordBatch}} level (hiding potential errors). Instead, it should probably be done explicitly inside Acero/dataset. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18036) [C++] Use BUILD_TESTING=OFF for abseil-cpp
Neal Richardson created ARROW-18036: --- Summary: [C++] Use BUILD_TESTING=OFF for abseil-cpp Key: ARROW-18036 URL: https://issues.apache.org/jira/browse/ARROW-18036 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Neal Richardson Assignee: Neal Richardson In https://github.com/abseil/abseil-cpp/commit/a50ae369a30f99f79d7559002aba3413dac1bd48, the argument changed from {{ABSL_RUN_TESTS}} to {{BUILD_TESTING}}. A verbose thirdparty build now shows that ABSL_RUN_TESTS is being ignored. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18035) [Java] Enable allocator logging in CI
David Li created ARROW-18035: Summary: [Java] Enable allocator logging in CI Key: ARROW-18035 URL: https://issues.apache.org/jira/browse/ARROW-18035 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li This would help track down certain flaky tests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18034) [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch is flaky on Windows CI
David Li created ARROW-18034: Summary: [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch is flaky on Windows CI Key: ARROW-18034 URL: https://issues.apache.org/jira/browse/ARROW-18034 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java Reporter: David Li {noformat} java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (134217728) Allocator(ROOT) 0/134217728/270532608/9223372036854775807 (res/actual/peak/limit) at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:437) at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29) at org.apache.arrow.flight.TestBasicOperation$Producer.close(TestBasicOperation.java:514) at org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:333) at org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:312) at org.apache.arrow.flight.TestBasicOperation.getStreamLargeBatch(TestBasicOperation.java:270) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18033) [CI] set-output in GHA is deprecated
Neal Richardson created ARROW-18033: --- Summary: [CI] set-output in GHA is deprecated Key: ARROW-18033 URL: https://issues.apache.org/jira/browse/ARROW-18033 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Neal Richardson See https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18032) pyarrow no go with pip3 and py-3.11rc2
Aleksandar created ARROW-18032: -- Summary: pyarrow no go with pip3 and py-3.11rc2 Key: ARROW-18032 URL: https://issues.apache.org/jira/browse/ARROW-18032 Project: Apache Arrow Issue Type: Bug Affects Versions: 9.0.0 Reporter: Aleksandar I tried with ver 9.0.0 and testing versions Every time same thing: CMake Error at /usr/local/share/cmake-3.21/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find Python3 (missing: Python3_NumPy_INCLUDE_DIRS NumPy) (found version "3.11.0") I am not sure where this can be changed in templates to enable support for 3.11 Regards, -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18031) [C++][Parquet] Undefined behavior in boolean RLE decoder
Antoine Pitrou created ARROW-18031: -- Summary: [C++][Parquet] Undefined behavior in boolean RLE decoder Key: ARROW-18031 URL: https://issues.apache.org/jira/browse/ARROW-18031 Project: Apache Arrow Issue Type: Bug Components: C++, Parquet Reporter: Antoine Pitrou Fix For: 10.0.0 A fuzzing run found this undefined behavior, which hints that the RLE boolean decoder implementation is wrong: {code} #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x77a45859 in __GI_abort () at abort.c:79 #2 0x5beafa07 in __sanitizer::Abort() () #3 0x5bead8a1 in __sanitizer::Die() () #4 0x5bec15cc in __ubsan::ScopedReport::~ScopedReport() () #5 0x5bec437b in handleLoadInvalidValue(__ubsan::InvalidValueData*, unsigned long, __ubsan::ReportOptions) () #6 0x5bec43be in __ubsan_handle_load_invalid_value_abort () #7 0x5c5acb9b in arrow::bit_util::BitReader::GetAligned (this=0x60701060, num_bytes=1, v=0x7fff99d0) at /home/antoine/arrow/dev/cpp/src/arrow/util/bit_stream_utils.h:415 #8 0x5c5aa7d4 in arrow::util::RleDecoder::NextCounts (this=0x60701060) at /home/antoine/arrow/dev/cpp/src/arrow/util/rle_encoding.h:663 #9 0x5c5a7328 in arrow::util::RleDecoder::GetBatch (this=0x60701060, values=0x75408000, batch_size=2089) at /home/antoine/arrow/dev/cpp/src/arrow/util/rle_encoding.h:329 #10 0x5c59834e in parquet::(anonymous namespace)::RleBooleanDecoder::Decode (this=0x60603ce0, buffer=0x75408000, max_values=2089) at /home/antoine/arrow/dev/cpp/src/parquet/encoding.cc:2388 #11 0x5c4f43d9 in parquet::internal::(anonymous namespace)::TypedRecordReader >::ReadValuesDense ( this=0x61401050, values_to_read=2089) at /home/antoine/arrow/dev/cpp/src/parquet/column_reader.cc:1531 #12 0x5c4f7668 in parquet::internal::(anonymous namespace)::TypedRecordReader >::ReadRecordData ( this=0x61401050, num_records=2089) at /home/antoine/arrow/dev/cpp/src/parquet/column_reader.cc:1575 #13 0x5c4f03e5 in parquet::internal::(anonymous namespace)::TypedRecordReader >::ReadRecords ( this=0x61401050, num_records=2089) at /home/antoine/arrow/dev/cpp/src/parquet/column_reader.cc:1331 #14 0x5bf0acee in parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch (this=0x60801020, records_to_read=2089) at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:479 #15 0x5bf019df in parquet::arrow::ColumnReaderImpl::NextBatch (this=0x60801020, batch_size=2089, out=0x7fffb740) at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:109 #16 0x5bf78829 in parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn (this=0x61301a80, i=0, row_groups=std::vector of length 1, capacity 1 = {...}, reader=0x60801020, out=0x7fffb740) at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:285 #17 0x5bff1b9c in parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr, std::vector > const&, std::vector > const&, arrow::internal::Executor*)::$_4::operator()(unsigned long, std::shared_ptr) const (this=0x7fffbdc0, i=0, reader=warning: RTTI symbol not found for class 'std::_Sp_counted_deleter, std::allocator, (__gnu_cxx::_Lock_policy)2>' warning: RTTI symbol not found for class 'std::_Sp_counted_deleter, std::allocator, (__gnu_cxx::_Lock_policy)2>' std::shared_ptr (use count 2, weak count 0) = {...}) at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:1236 #18 0x5bfed49d in arrow::internal::OptionalParallelForAsync, std::vector > const&, std::vector > const&, arrow::internal::Executor*)::$_4&, std::shared_ptr, std::shared_ptr >(bool, std::vector, std::allocator > >, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr, std::vector > const&, std::vector > const&, arrow::internal::Executor*)::$_4&, arrow::internal::Executor*) (use_threads=false, inputs=std::vector of length 1, capacity 1 = {...}, func=..., executor=0x60402b90) at /home/antoine/arrow/dev/cpp/src/arrow/util/parallel.h:95 #19 0x5bfebe4c in parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups (this=0x61301a80, self=std::shared_ptr (empty) = {...}, row_groups=std::vector of length 1, capacity 1 = {...}, column_indices=std::vector of length 1, capacity 1 = {...}, cpu_executor=0x60402b90) at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:1254 #20 0x5bee0d57 in parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadRowGroups (this=0x61301a80, row_groups=std::vector of length 1, capacity 1 = {...}, column_indices=std::vector of length 1, capacity 1 = {...}, out=0x7fffc880) at
[jira] [Created] (ARROW-18030) [C++] Bump lz4 to 1.9.4
Antoine Pitrou created ARROW-18030: -- Summary: [C++] Bump lz4 to 1.9.4 Key: ARROW-18030 URL: https://issues.apache.org/jira/browse/ARROW-18030 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou Fix For: 10.0.0 We currently vendor a development version of lz4 to get some required stability fixes. We should bump to 1.9.4, which was recently released: https://github.com/lz4/lz4/releases -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18029) [Format] archery lint for cmake should show error details
Yaron Gvili created ARROW-18029: --- Summary: [Format] archery lint for cmake should show error details Key: ARROW-18029 URL: https://issues.apache.org/jira/browse/ARROW-18029 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Yaron Gvili Here is example output from a failed invocation of `archery lint --cmake-format`: INFO:archery:Running cmake-format linters ERROR __main__.py:618: Check failed: /arrow/cpp/cmake_modules/ThirdpartyToolchain.cmake It would be helpful to get the error details on failure, e.g., as a diff output like for C++. Granted, this may be low priority since `archery lint --cmake-format --fix` fixes the errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18028) [Dev][Archery][Crossbow] Always use GitHub Action's build URL in PR comment
Kouhei Sutou created ARROW-18028: Summary: [Dev][Archery][Crossbow] Always use GitHub Action's build URL in PR comment Key: ARROW-18028 URL: https://issues.apache.org/jira/browse/ARROW-18028 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18027) [Dev][Archery][Crossbow] Reuse GitHub Token
Kouhei Sutou created ARROW-18027: Summary: [Dev][Archery][Crossbow] Reuse GitHub Token Key: ARROW-18027 URL: https://issues.apache.org/jira/browse/ARROW-18027 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-18026) [C++][Gandiva] Add div and mod functions for unsigned ints
Jin Shang created ARROW-18026: - Summary: [C++][Gandiva] Add div and mod functions for unsigned ints Key: ARROW-18026 URL: https://issues.apache.org/jira/browse/ARROW-18026 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva Reporter: Jin Shang -- This message was sent by Atlassian Jira (v8.20.10#820010)