[jira] [Created] (ARROW-18047) [Dev][Archery][Crossbow] Queue.put() should use Job.queue setter

2022-10-13 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-18047:


 Summary: [Dev][Archery][Crossbow] Queue.put() should use Job.queue 
setter
 Key: ARROW-18047
 URL: https://issues.apache.org/jira/browse/ARROW-18047
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


This is related to ARROW-18028.

Comment bot reports the following error with ARROW-18028:

https://github.com/apache/arrow/pull/14409#issuecomment-1278351434

{noformat}
'NoneType' object has no attribute 'github_commit'
The Archery job run can be found at: 
https://github.com/apache/arrow/actions/runs/3246777470
{noformat}

https://github.com/apache/arrow/actions/runs/3246777470

{noformat}
ERROR:archery:'NoneType' object has no attribute 'github_commit'
Traceback (most recent call last):
  File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 
153, in handle_issue_comment
self.handler(command, issue=issue, pull_request=pull,
  File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 
56, in __call__
return self.invoke(ctx)
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 760, in invoke
return __callback(*args, **kwargs)
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/decorators.py",
 line 38, in new_func
return f(get_current_context().obj, *args, **kwargs)
  File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 
276, in submit
pull_request.create_issue_comment(report.show())
  File 
"/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", 
line 333, in show
url=self.task_url(task)
  File 
"/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", 
line 69, in task_url
if task.status().build_links:
  File 
"/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/core.py", 
line 869, in status
github_commit = self._queue.github_commit(self.commit)
AttributeError: 'NoneType' object has no attribute 'github_commit'
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18046) [Dev][Archery][Crossbow] Queue.put() should use Job.queue setter

2022-10-13 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-18046:


 Summary: [Dev][Archery][Crossbow] Queue.put() should use Job.queue 
setter
 Key: ARROW-18046
 URL: https://issues.apache.org/jira/browse/ARROW-18046
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou


This is related to ARROW-18028.

Comment bot reports the following error with ARROW-18028:

https://github.com/apache/arrow/pull/14409#issuecomment-1278351434

{noformat}
'NoneType' object has no attribute 'github_commit'
The Archery job run can be found at: 
https://github.com/apache/arrow/actions/runs/3246777470
{noformat}

https://github.com/apache/arrow/actions/runs/3246777470

{noformat}
ERROR:archery:'NoneType' object has no attribute 'github_commit'
Traceback (most recent call last):
  File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 
153, in handle_issue_comment
self.handler(command, issue=issue, pull_request=pull,
  File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 
56, in __call__
return self.invoke(ctx)
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/core.py",
 line 760, in invoke
return __callback(*args, **kwargs)
  File 
"/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/click/decorators.py",
 line 38, in new_func
return f(get_current_context().obj, *args, **kwargs)
  File "/home/runner/work/arrow/arrow/arrow/dev/archery/archery/bot.py", line 
276, in submit
pull_request.create_issue_comment(report.show())
  File 
"/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", 
line 333, in show
url=self.task_url(task)
  File 
"/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/reports.py", 
line 69, in task_url
if task.status().build_links:
  File 
"/home/runner/work/arrow/arrow/arrow/dev/archery/archery/crossbow/core.py", 
line 869, in status
github_commit = self._queue.github_commit(self.commit)
AttributeError: 'NoneType' object has no attribute 'github_commit'
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18045) Cannot install on Ubuntu 20.04

2022-10-13 Thread Joshua Wang (Jira)
Joshua Wang created ARROW-18045:
---

 Summary: Cannot install on Ubuntu 20.04
 Key: ARROW-18045
 URL: https://issues.apache.org/jira/browse/ARROW-18045
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 9.0.0, 6.0.1
Reporter: Joshua Wang
 Attachments: arrow_install_logs.txt, pip_install_error.txt

I'm trying to install {{pyarrow}} version {{6.0.1}} on a Raspberry Pi running 
Ubuntu 20.04, but it fails with an error {{Could NOT find Arrow (missing: 
Arrow_DIR)?? (full error log for pip install attached below).

I tried running through the ubuntu install steps 
[here/|https://arrow.apache.org/install/]
It errors out when trying to install {{libarrow-dev}}. I've attached the full 
output below as well.

Can someone please let me know what I'm doing wrong?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18044) [Java] upgrade error-prone library to 2.16.0

2022-10-13 Thread Larry White (Jira)
Larry White created ARROW-18044:
---

 Summary: [Java] upgrade error-prone library to 2.16.0
 Key: ARROW-18044
 URL: https://issues.apache.org/jira/browse/ARROW-18044
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Larry White


Current version of errorprone interacts badly with Intellij, leading to 
erroneous (ironically ) reporting of an error for using "non-standard ascii 
characters".

 

This causes intermittent but frequent failures of arbitrary tests and is thus 
crazy-making. 

See Errorprone issue https://github.com/google/error-prone/issues/3092



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-testing] pitrou merged pull request #81: ARROW-18031: [C++][Parquet] Undefined behavior in boolean RLE decoder

2022-10-13 Thread GitBox


pitrou merged PR #81:
URL: https://github.com/apache/arrow-testing/pull/81


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] zeroshade opened a new pull request, #81: ARROW-18031: [C++][Parquet] Undefined behavior in boolean RLE decoder

2022-10-13 Thread GitBox


zeroshade opened a new pull request, #81:
URL: https://github.com/apache/arrow-testing/pull/81

   Corresponding Fix for this issue found in 
https://github.com/apache/arrow/pull/14407


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-18043) [R] Properly instantiate empty arrays of extension types in Table__from_schema

2022-10-13 Thread Nicola Crane (Jira)
Nicola Crane created ARROW-18043:


 Summary: [R] Properly instantiate empty arrays of extension types 
in Table__from_schema
 Key: ARROW-18043
 URL: https://issues.apache.org/jira/browse/ARROW-18043
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Nicola Crane


The PR for ARROW-12105 introduces the function Table__from_schema which creates 
an empty Table from a Schema object.  Currently it can't handle extension 
types, and instead just returns NULL type objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18042) [Java] Distribute Apple M1 compatible JNI libraries via mavencentral

2022-10-13 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-18042:
--

 Summary: [Java] Distribute Apple M1 compatible JNI libraries via 
mavencentral
 Key: ARROW-18042
 URL: https://issues.apache.org/jira/browse/ARROW-18042
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Java
Affects Versions: 9.0.0
Reporter: Rok Mihevc


Currently JNI libraries need to be built locally to be usable on Apple silicon. 
We should build and distribute compatible libraries via mavencentral.

@dsusanibara @lidavidm

Also see ARROW-17267 and ARROW-16608



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18041) [Python] Sustrait-related test failure in wheel tests

2022-10-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-18041:
--

 Summary: [Python] Sustrait-related test failure in wheel tests
 Key: ARROW-18041
 URL: https://issues.apache.org/jira/browse/ARROW-18041
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Packaging, Python
Reporter: Antoine Pitrou
 Fix For: 10.0.0


See 
https://github.com/ursacomputing/crossbow/actions/runs/3240936478/jobs/5312200303#step:7:341

{code}

2022-10-13T09:42:51.5203618Z __ 
test_run_serialized_query __
2022-10-13T09:42:51.5203890Z 
2022-10-13T09:42:51.5204391Z tmpdir = 
local('C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\pytest-of-ContainerAdministrator\\pytest-0\\test_run_serialized_query0')
2022-10-13T09:42:51.5205282Z 
2022-10-13T09:42:51.5205769Z def test_run_serialized_query(tmpdir):
2022-10-13T09:42:51.5206172Z substrait_query = """
2022-10-13T09:42:51.5206505Z {
2022-10-13T09:42:51.5206828Z "relations": [
2022-10-13T09:42:51.5207175Z {"rel": {
2022-10-13T09:42:51.5207501Z "read": {
2022-10-13T09:42:51.5207800Z "base_schema": {
2022-10-13T09:42:51.5208155Z "struct": {
2022-10-13T09:42:51.5208491Z "types": [
2022-10-13T09:42:51.5208841Z {"i64": {}}
2022-10-13T09:42:51.5209182Z ]
2022-10-13T09:42:51.5209501Z },
2022-10-13T09:42:51.5209829Z "names": [
2022-10-13T09:42:51.5210168Z "foo"
2022-10-13T09:42:51.5210611Z ]
2022-10-13T09:42:51.5211097Z },
2022-10-13T09:42:51.5211453Z "local_files": {
2022-10-13T09:42:51.5211747Z "items": [
2022-10-13T09:42:51.5212083Z {
2022-10-13T09:42:51.5212530Z "uri_file": 
"file://FILENAME_PLACEHOLDER",
2022-10-13T09:42:51.5212930Z "arrow": {}
2022-10-13T09:42:51.5213261Z }
2022-10-13T09:42:51.5213579Z ]
2022-10-13T09:42:51.5213885Z }
2022-10-13T09:42:51.5214188Z }
2022-10-13T09:42:51.5214491Z }}
2022-10-13T09:42:51.5214795Z ]
2022-10-13T09:42:51.5215053Z }
2022-10-13T09:42:51.5215399Z """
2022-10-13T09:42:51.5215708Z 
2022-10-13T09:42:51.5355345Z file_name = "read_data.arrow"
2022-10-13T09:42:51.5356563Z table = pa.table([[1, 2, 3, 4, 5]], 
names=['foo'])
2022-10-13T09:42:51.5360922Z path = _write_dummy_data_to_disk(tmpdir, 
file_name, table)
2022-10-13T09:42:51.5361743Z query = 
tobytes(substrait_query.replace("FILENAME_PLACEHOLDER", path))
2022-10-13T09:42:51.5362170Z 
2022-10-13T09:42:51.5362589Z buf = pa._substrait._parse_json_plan(query)
2022-10-13T09:42:51.5362990Z 
2022-10-13T09:42:51.5363388Z >   reader = substrait.run_query(buf)
2022-10-13T09:42:51.5363692Z 
2022-10-13T09:42:51.5364018Z 
Python\lib\site-packages\pyarrow\tests\test_substrait.py:79: 
2022-10-13T09:42:51.5364520Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2022-10-13T09:42:51.5365008Z pyarrow\_substrait.pyx:146: in 
pyarrow._substrait.run_query
2022-10-13T09:42:51.5365444Z ???
2022-10-13T09:42:51.5365903Z pyarrow\error.pxi:144: in 
pyarrow.lib.pyarrow_internal_check_status
2022-10-13T09:42:51.5366352Z ???
2022-10-13T09:42:51.5366746Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _
2022-10-13T09:42:51.5367047Z 
2022-10-13T09:42:51.5367246Z >   ???
2022-10-13T09:42:51.5376405Z E   pyarrow.lib.ArrowInvalid: Cannot parse URI: 
'file://C:UsersContainerAdministratorAppDataLocalTemppytest-of-ContainerAdministratorpytest-0
  est_run_serialized_query0
2022-10-13T09:42:51.5377196Z ead_data.arrow'
2022-10-13T09:42:51.5377363Z 
2022-10-13T09:42:51.5377519Z pyarrow\error.pxi:100: ArrowInvalid
2022-10-13T09:42:51.5377857Z __ 
test_binary_conversion_with_json_options ___
2022-10-13T09:42:51.5378087Z 
2022-10-13T09:42:51.5378488Z tmpdir = 
local('C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\pytest-of-ContainerAdministrator\\pytest-0\\test_binary_conversion_with_js0')
2022-10-13T09:42:51.5378905Z 
2022-10-13T09:42:51.5379091Z def 
test_binary_conversion_with_json_options(tmpdir):
2022-10-13T09:42:51.5379432Z substrait_query = """
2022-10-13T09:42:51.5379695Z {
2022-10-13T09:42:51.5379951Z "relations": [
2022-10-13T09:42:51.5380229Z {"rel": {
2022-10-13T09:42:51.5380492Z "read": {
2022-10-13T09:42:51.5380954Z "base_schema": {
2022-10-13T09:42:51.5381237Z "struct": {
2022-10-13T09:42:51.5381473Z  

[jira] [Created] (ARROW-18040) [Plasma] Remove Plasma

2022-10-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-18040:
--

 Summary: [Plasma] Remove Plasma
 Key: ARROW-18040
 URL: https://issues.apache.org/jira/browse/ARROW-18040
 Project: Apache Arrow
  Issue Type: Task
  Components: C++ - Plasma, Documentation, GLib, Java, Python, Ruby
Reporter: Antoine Pitrou


Plasma was deprecated in ARROW-17860.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18039) [C++][CI] Reduce MinGW build times

2022-10-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-18039:
--

 Summary: [C++][CI] Reduce MinGW build times
 Key: ARROW-18039
 URL: https://issues.apache.org/jira/browse/ARROW-18039
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


The MinGW C++ builds on CI currently build in release mode. This is probably 
because debug builds on Windows are complicated (you must get all the 
dependencies also compiled in debug mode, AFAIU).

However, we could probably disable optimizations, so as to reduce compilation 
times.
The compilation flags are currently as follows:
{code}
-- CMAKE_C_FLAGS:  -O2 -DNDEBUG -ftree-vectorize  -Wa,-mbig-obj -Wall 
-Wno-conversion -Wno-sign-conversion -Wunused-result 
-fno-semantic-interposition -mxsave -msse4.2 
-- CMAKE_CXX_FLAGS:  -Wno-noexcept-type  -fdiagnostics-color=always -O2 
-DNDEBUG -ftree-vectorize  -Wa,-mbig-obj -Wall -Wno-conversion 
-Wno-sign-conversion -Wunused-result -fno-semantic-interposition -mxsave 
-msse4.2 
{code}

Perhaps we can pass {{-O0}}?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18038) [Archery][CI] Refactor git dependencies used on archery to be more consistent

2022-10-13 Thread Jira
Raúl Cumplido created ARROW-18038:
-

 Summary: [Archery][CI] Refactor git dependencies used on archery 
to be more consistent
 Key: ARROW-18038
 URL: https://issues.apache.org/jira/browse/ARROW-18038
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery
Reporter: Raúl Cumplido


Currently archery has the following git related dependencies:
{code:java}
    'release': ['gitpython']
    'crossbow': ['github3.py', 'pygit2>=1.6.0']
    'crossbow-upload': ['github3.py']
'bot': ['github3.py', 'pygit2>=1.6.0', 'pygithub']{code}
that makes difficult to work with archery git related code and makes more 
difficult code reuse. As an example the comment on this PR: 
[https://github.com/apache/arrow/pull/14033#discussion_r993778812]
{code:java}
While dev/archery/archery/crossbow/core.py uses pygit2, 
dev/archery/archery/release/core.py uses GitPython. The Repo class that is used 
in each module are also not shared. {code}
We should refactor archery to not require 2 different github libraries (github3 
and pygithub) and 2 different git ones (pygit and gitpython).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18037) [C++] Acero/dataset relies on ExecBatch::ToRecordBatch truncating excess columns

2022-10-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-18037:
--

 Summary: [C++] Acero/dataset relies on ExecBatch::ToRecordBatch 
truncating excess columns
 Key: ARROW-18037
 URL: https://issues.apache.org/jira/browse/ARROW-18037
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou


As found while working on ARROW-18004: the dataset scanner and the Acero engine 
rely on {{ExecBatch::ToRecordBatch}} returning successfully when the given 
schema has fewer fields than the ExecBatch has columns.

This apparently allows to implicitly drop the dataset-added columns 
({{kAugmentedFields}} in {{arrow/dataset/scanner.cc}}) from a scan's final 
result.

However, it seems wrong and brittle to do this implicitly at the 
{{ExecBatch::ToRecordBatch}} level (hiding potential errors). Instead, it 
should probably be done explicitly inside Acero/dataset.




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18036) [C++] Use BUILD_TESTING=OFF for abseil-cpp

2022-10-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-18036:
---

 Summary: [C++] Use BUILD_TESTING=OFF for abseil-cpp
 Key: ARROW-18036
 URL: https://issues.apache.org/jira/browse/ARROW-18036
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Neal Richardson
Assignee: Neal Richardson


In 
https://github.com/abseil/abseil-cpp/commit/a50ae369a30f99f79d7559002aba3413dac1bd48,
 the argument changed from {{ABSL_RUN_TESTS}} to {{BUILD_TESTING}}. A verbose 
thirdparty build now shows that ABSL_RUN_TESTS is being ignored. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18035) [Java] Enable allocator logging in CI

2022-10-13 Thread David Li (Jira)
David Li created ARROW-18035:


 Summary: [Java] Enable allocator logging in CI
 Key: ARROW-18035
 URL: https://issues.apache.org/jira/browse/ARROW-18035
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


This would help track down certain flaky tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18034) [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch is flaky on Windows CI

2022-10-13 Thread David Li (Jira)
David Li created ARROW-18034:


 Summary: [Java][FlightRPC] TestBasicOperation.getStreamLargeBatch 
is flaky on Windows CI
 Key: ARROW-18034
 URL: https://issues.apache.org/jira/browse/ARROW-18034
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC, Java
Reporter: David Li


{noformat}
java.lang.IllegalStateException: 
Memory was leaked by query. Memory leaked: (134217728)
Allocator(ROOT) 0/134217728/270532608/9223372036854775807 
(res/actual/peak/limit)

at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:437)
at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
at 
org.apache.arrow.flight.TestBasicOperation$Producer.close(TestBasicOperation.java:514)
at 
org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:333)
at 
org.apache.arrow.flight.TestBasicOperation.test(TestBasicOperation.java:312)
at 
org.apache.arrow.flight.TestBasicOperation.getStreamLargeBatch(TestBasicOperation.java:270)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18033) [CI] set-output in GHA is deprecated

2022-10-13 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-18033:
---

 Summary: [CI] set-output in GHA is deprecated
 Key: ARROW-18033
 URL: https://issues.apache.org/jira/browse/ARROW-18033
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Neal Richardson


See 
https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18032) pyarrow no go with pip3 and py-3.11rc2

2022-10-13 Thread Aleksandar (Jira)
Aleksandar created ARROW-18032:
--

 Summary: pyarrow no go with pip3 and py-3.11rc2
 Key: ARROW-18032
 URL: https://issues.apache.org/jira/browse/ARROW-18032
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 9.0.0
Reporter: Aleksandar


I tried with ver 9.0.0 and testing versions

Every time same thing:

CMake Error at 
/usr/local/share/cmake-3.21/Modules/FindPackageHandleStandardArgs.cmake:230 
(message):
        Could NOT find Python3 (missing: Python3_NumPy_INCLUDE_DIRS NumPy) 
(found
        version "3.11.0")

I am not sure where this can be changed in templates to enable support for 3.11

Regards,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18031) [C++][Parquet] Undefined behavior in boolean RLE decoder

2022-10-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-18031:
--

 Summary: [C++][Parquet] Undefined behavior in boolean RLE decoder
 Key: ARROW-18031
 URL: https://issues.apache.org/jira/browse/ARROW-18031
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Parquet
Reporter: Antoine Pitrou
 Fix For: 10.0.0


A fuzzing run found this undefined behavior, which hints that the RLE boolean 
decoder implementation is wrong:
{code}
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x77a45859 in __GI_abort () at abort.c:79
#2  0x5beafa07 in __sanitizer::Abort() ()
#3  0x5bead8a1 in __sanitizer::Die() ()
#4  0x5bec15cc in __ubsan::ScopedReport::~ScopedReport() ()
#5  0x5bec437b in handleLoadInvalidValue(__ubsan::InvalidValueData*, 
unsigned long, __ubsan::ReportOptions) ()
#6  0x5bec43be in __ubsan_handle_load_invalid_value_abort ()
#7  0x5c5acb9b in arrow::bit_util::BitReader::GetAligned 
(this=0x60701060, num_bytes=1, v=0x7fff99d0)
at /home/antoine/arrow/dev/cpp/src/arrow/util/bit_stream_utils.h:415
#8  0x5c5aa7d4 in arrow::util::RleDecoder::NextCounts 
(this=0x60701060) at 
/home/antoine/arrow/dev/cpp/src/arrow/util/rle_encoding.h:663
#9  0x5c5a7328 in arrow::util::RleDecoder::GetBatch 
(this=0x60701060, values=0x75408000, batch_size=2089)
at /home/antoine/arrow/dev/cpp/src/arrow/util/rle_encoding.h:329
#10 0x5c59834e in parquet::(anonymous 
namespace)::RleBooleanDecoder::Decode (this=0x60603ce0, 
buffer=0x75408000, max_values=2089)
at /home/antoine/arrow/dev/cpp/src/parquet/encoding.cc:2388
#11 0x5c4f43d9 in parquet::internal::(anonymous 
namespace)::TypedRecordReader 
>::ReadValuesDense (
this=0x61401050, values_to_read=2089) at 
/home/antoine/arrow/dev/cpp/src/parquet/column_reader.cc:1531
#12 0x5c4f7668 in parquet::internal::(anonymous 
namespace)::TypedRecordReader 
>::ReadRecordData (
this=0x61401050, num_records=2089) at 
/home/antoine/arrow/dev/cpp/src/parquet/column_reader.cc:1575
#13 0x5c4f03e5 in parquet::internal::(anonymous 
namespace)::TypedRecordReader 
>::ReadRecords (
this=0x61401050, num_records=2089) at 
/home/antoine/arrow/dev/cpp/src/parquet/column_reader.cc:1331
#14 0x5bf0acee in parquet::arrow::(anonymous 
namespace)::LeafReader::LoadBatch (this=0x60801020, records_to_read=2089)
at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:479
#15 0x5bf019df in parquet::arrow::ColumnReaderImpl::NextBatch 
(this=0x60801020, batch_size=2089, out=0x7fffb740)
at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:109
#16 0x5bf78829 in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::ReadColumn (this=0x61301a80, i=0, 
row_groups=std::vector of length 1, capacity 1 = {...}, 
reader=0x60801020, out=0x7fffb740)
at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:285
#17 0x5bff1b9c in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr, std::vector > const&, 
std::vector > const&, 
arrow::internal::Executor*)::$_4::operator()(unsigned long, 
std::shared_ptr) const (this=0x7fffbdc0, 
i=0, reader=warning: RTTI symbol not found for class 
'std::_Sp_counted_deleter, std::allocator, 
(__gnu_cxx::_Lock_policy)2>'
warning: RTTI symbol not found for class 
'std::_Sp_counted_deleter, std::allocator, 
(__gnu_cxx::_Lock_policy)2>'

std::shared_ptr (use count 2, weak count 0) = 
{...}) at /home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:1236
#18 0x5bfed49d in 
arrow::internal::OptionalParallelForAsync, std::vector > const&, 
std::vector > const&, 
arrow::internal::Executor*)::$_4&, 
std::shared_ptr, 
std::shared_ptr >(bool, 
std::vector, 
std::allocator > >, 
parquet::arrow::(anonymous 
namespace)::FileReaderImpl::DecodeRowGroups(std::shared_ptr, std::vector > const&, 
std::vector > const&, 
arrow::internal::Executor*)::$_4&, arrow::internal::Executor*) 
(use_threads=false, inputs=std::vector of length 1, capacity 1 = {...}, 
func=..., executor=0x60402b90)
at /home/antoine/arrow/dev/cpp/src/arrow/util/parallel.h:95
#19 0x5bfebe4c in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::DecodeRowGroups (this=0x61301a80, 
self=std::shared_ptr 
(empty) = {...}, row_groups=std::vector of length 1, capacity 1 = {...}, 
column_indices=std::vector of length 1, capacity 1 = {...}, 
cpu_executor=0x60402b90) at 
/home/antoine/arrow/dev/cpp/src/parquet/arrow/reader.cc:1254
#20 0x5bee0d57 in parquet::arrow::(anonymous 
namespace)::FileReaderImpl::ReadRowGroups (this=0x61301a80, 
row_groups=std::vector of length 1, capacity 1 = {...}, 
column_indices=std::vector of length 1, capacity 1 = {...}, out=0x7fffc880)
at 

[jira] [Created] (ARROW-18030) [C++] Bump lz4 to 1.9.4

2022-10-13 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-18030:
--

 Summary: [C++] Bump lz4 to 1.9.4
 Key: ARROW-18030
 URL: https://issues.apache.org/jira/browse/ARROW-18030
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 10.0.0


We currently vendor a development version of lz4 to get some required stability 
fixes.
We should bump to 1.9.4, which was recently released:
https://github.com/lz4/lz4/releases




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18029) [Format] archery lint for cmake should show error details

2022-10-13 Thread Yaron Gvili (Jira)
Yaron Gvili created ARROW-18029:
---

 Summary: [Format] archery lint for cmake should show error details
 Key: ARROW-18029
 URL: https://issues.apache.org/jira/browse/ARROW-18029
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Reporter: Yaron Gvili


Here is example output from a failed invocation  of `archery lint 
--cmake-format`:
 
INFO:archery:Running cmake-format linters
ERROR __main__.py:618: Check failed: 
/arrow/cpp/cmake_modules/ThirdpartyToolchain.cmake
 
It would be helpful to get the error details on failure, e.g., as a diff output 
like for C++. Granted, this may be low priority since `archery lint 
--cmake-format --fix` fixes the errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18028) [Dev][Archery][Crossbow] Always use GitHub Action's build URL in PR comment

2022-10-13 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-18028:


 Summary: [Dev][Archery][Crossbow] Always use GitHub Action's build 
URL in PR comment
 Key: ARROW-18028
 URL: https://issues.apache.org/jira/browse/ARROW-18028
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18027) [Dev][Archery][Crossbow] Reuse GitHub Token

2022-10-13 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-18027:


 Summary: [Dev][Archery][Crossbow] Reuse GitHub Token
 Key: ARROW-18027
 URL: https://issues.apache.org/jira/browse/ARROW-18027
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-18026) [C++][Gandiva] Add div and mod functions for unsigned ints

2022-10-13 Thread Jin Shang (Jira)
Jin Shang created ARROW-18026:
-

 Summary: [C++][Gandiva] Add div and mod functions for unsigned ints
 Key: ARROW-18026
 URL: https://issues.apache.org/jira/browse/ARROW-18026
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva
Reporter: Jin Shang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)