[jira] [Created] (ARROW-17813) [Python] Nested ExtensionArray conversion to/from pandas/numpy

2022-09-21 Thread Chang She (Jira)
Chang She created ARROW-17813:
-

 Summary: [Python] Nested ExtensionArray conversion to/from 
pandas/numpy
 Key: ARROW-17813
 URL: https://issues.apache.org/jira/browse/ARROW-17813
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 9.0.0
Reporter: Chang She


user@ thread: [https://lists.apache.org/thread/dhnxq0g4kgdysjowftfv3z5ngj780xpb]
repro gist: 
[https://gist.github.com/changhiskhan/4163f8cec675a2418a69ec9168d5fdd9]

*Arrow => numpy/pandas*

For a non-nested array, pa.ExtensionArray.to_numpy automatically "lowers" to 
the storage type (as expected). However this is not done for nested arrays:

{code:python}
import pyarrow as pa

class LabelType(pa.ExtensionType):

def __init__(self):
super(LabelType, self).__init__(pa.string(), "label")

def __arrow_ext_serialize__(self):
return b""

@classmethod
def __arrow_ext_deserialize__(cls, storage_type, serialized):
return LabelType()

storage = pa.array(["dog", "cat", "horse"])
ext_arr = pa.ExtensionArray.from_storage(LabelType(), storage)
offsets = pa.array([0, 1])
list_arr = pa.ListArray.from_arrays(offsets, ext_arr)
list_arr.to_numpy()
{code}
{code:java}
---
ArrowNotImplementedError  Traceback (most recent call last)
Cell In [15], line 1
> 1 list_arr.to_numpy()

File 
/mnt/lance/.venv/lance/lib/python3.10/site-packages/pyarrow/array.pxi:1445, in 
pyarrow.lib.Array.to_numpy()

File /mnt/lance/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:121, 
in pyarrow.lib.check_status()

ArrowNotImplementedError: Not implemented type for Arrow list to pandas: 
extension>
{code}

As mentioned on the user thread linked from the top, a fairly generic solution 
would just have the conversion default to the storage array's to_numpy.

 
*pandas/numpy => Arrow*

Equivalently, conversion to Arrow is also difficult for nested extension types: 

if I have say a pandas DataFrame that has a column of list-of-string and I want 
to convert that to list-of-label Array. Currently I have to:
1. Convert to list-of-string (storage) numpy array to pa.list_(pa.string())
2. Convert the string values array to ExtensionArray, then reconstitue a 
list array using the ExtensionArray combined with the offsets from 
the result of step 1

{code:python}
import pyarrow as pa
import pandas as pd
df = pd.DataFrame({'labels': [["dog", "horse", "cat"], ["person", "person", 
"car", "car"]]})
list_of_storage = pa.array(df.labels)
ext_values = pa.ExtensionArray.from_storage(LabelType(), list_of_storage.values)
list_of_ext = pa.ListArray.from_arrays(offsets=list_of_storage.offsets, 
values=ext_values)
{code}


For non-nested columns, one can achieve easier conversion by defining a pandas 
extension dtype, but i don't think that works for a nested column. You would 
instead have to fallback to something like `pa.ExtensionArray.from_storage` (or 
`from_pandas`?) to do the trick. Even that doesn't necessarily work for 
something like a dictionary column because you'd have to pass in the dictionary 
somehow. Off the cuff, one could provide a custom lambda to 
`pa.Table.from_pandas` that is used for either specified column names / data 
types?


Thanks in advance for the consideration!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17812) [C++][Documentation] Add Gandiva User Guide

2022-09-21 Thread Will Jones (Jira)
Will Jones created ARROW-17812:
--

 Summary: [C++][Documentation] Add Gandiva User Guide
 Key: ARROW-17812
 URL: https://issues.apache.org/jira/browse/ARROW-17812
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva
Reporter: Will Jones
Assignee: Will Jones
 Fix For: 10.0.0






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17811) [Doc][Java] Document how dictionary encoding works

2022-09-21 Thread Larry White (Jira)
Larry White created ARROW-17811:
---

 Summary: [Doc][Java] Document how dictionary encoding works
 Key: ARROW-17811
 URL: https://issues.apache.org/jira/browse/ARROW-17811
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Affects Versions: 9.0.0
Reporter: Larry White


The ValueVector documentation does not include any discussion of dictionary 
encoding. There is example code on the IPC page 
https://arrow.apache.org/docs/dev/java/ipc.html, but it doesn't provide an 
overview. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17810) [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI

2022-09-21 Thread David Li (Jira)
David Li created ARROW-17810:


 Summary: [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI
 Key: ARROW-17810
 URL: https://issues.apache.org/jira/browse/ARROW-17810
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li
Assignee: David Li


Not sure why this didn't fail before, but we need to bump JaCoCo for Java 18 to 
work:

{noformat}
java.lang.instrument.IllegalClassFormatException: Error while instrumenting 
org/apache/calcite/avatica/AvaticaConnection$MockitoMock$854659140$auxiliary$kA4H37GT.
at 
org.jacoco.agent.rt.internal_3570298.CoverageTransformer.transform(CoverageTransformer.java:94)
at 
java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244)
at 
java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188)
at 
java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541)
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1013)
at 
java.base/java.lang.ClassLoader$ByteBuddyAccessor$PXg8JwS3.defineClass(Unknown 
Source)
at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:577)
at 
net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection$Dispatcher$UsingUnsafeInjection.defineClass(ClassInjector.java:1027)
at 
net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection.injectRaw(ClassInjector.java:279)
at 
net.bytebuddy.dynamic.loading.ClassInjector$AbstractBase.inject(ClassInjector.java:114)
at 
net.bytebuddy.dynamic.loading.ClassLoadingStrategy$Default$InjectionDispatcher.load(ClassLoadingStrategy.java:233)
at 
net.bytebuddy.dynamic.TypeResolutionStrategy$Passive.initialize(TypeResolutionStrategy.java:100)
at 
net.bytebuddy.dynamic.DynamicType$Default$Unloaded.load(DynamicType.java:6154)
at 
org.mockito.internal.creation.bytebuddy.SubclassBytecodeGenerator.mockClass(SubclassBytecodeGenerator.java:268)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40)
at 
org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.mockClass(InlineBytecodeGenerator.java:216)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366)
at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175)
at 
net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377)
at 
org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40)
at 
org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMockType(InlineDelegateByteBuddyMockMaker.java:391)
at 
org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.doCreateMock(InlineDelegateByteBuddyMockMaker.java:351)
at 
org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMock(InlineDelegateByteBuddyMockMaker.java:330)
at 
org.mockito.internal.creation.bytebuddy.InlineByteBuddyMockMaker.createMock(InlineByteBuddyMockMaker.java:58)
at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:53)
at org.mockito.internal.MockitoCore.mock(MockitoCore.java:84)
at org.mockito.Mockito.mock(Mockito.java:1964)
at 
org.mockito.internal.configuration.MockAnnotationProcessor.processAnnotationForMock(MockAnnotationProcessor.java:66)
at 
org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:27)
at 
org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:24)
at 
org.mockito.internal.configuration.IndependentAnnotationEngine.createMockFor(IndependentAnnotationEngine.java:45)
at 
org.mockito.internal.configuration.IndependentAnnotationEngine.process(IndependentAnnotationEngine.java:73)
at 

[jira] [Created] (ARROW-17809) [R] DuckDB test is failing (again) with new duckdb release

2022-09-21 Thread Dewey Dunnington (Jira)
Dewey Dunnington created ARROW-17809:


 Summary: [R] DuckDB test is failing (again) with new duckdb release
 Key: ARROW-17809
 URL: https://issues.apache.org/jira/browse/ARROW-17809
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Dewey Dunnington


It looks like the fix that I thought would work in DuckDB did not, in fact fix 
the error! The previous ticket, ARROW-17643, just skipped the test until the 
new release of duckdb (which just happened).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17808) [C#] FixedSizeList implementation is missing

2022-09-21 Thread helmi (Jira)
helmi created ARROW-17808:
-

 Summary: [C#] FixedSizeList implementation is missing
 Key: ARROW-17808
 URL: https://issues.apache.org/jira/browse/ARROW-17808
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C#
Affects Versions: 9.0.0
Reporter: helmi


Hi,

I'm working toward integrating apache arrow c# and I find out that 
FixedSizeList is not implemented. Is there a plan to implement the missing type 
? Otherwise, what are my options to overcome this issue ?

https://issues.apache.org/jira/browse/ARROW-17644



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17807) [C++] Regenerate Flatbuffers files for C++17

2022-09-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17807:
--

 Summary: [C++] Regenerate Flatbuffers files for C++17
 Key: ARROW-17807
 URL: https://issues.apache.org/jira/browse/ARROW-17807
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 10.0.0


We should enable C++17 features in the generated Flatbuffers sources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17806) pyarrow fails to write and read a dataframe with MultiIndex containing a RangeIndex with Pandas 1.5.0

2022-09-21 Thread Gianluca Ficarelli (Jira)
Gianluca Ficarelli created ARROW-17806:
--

 Summary: pyarrow fails to write and read a dataframe with 
MultiIndex containing a RangeIndex with Pandas 1.5.0
 Key: ARROW-17806
 URL: https://issues.apache.org/jira/browse/ARROW-17806
 Project: Apache Arrow
  Issue Type: Bug
  Components: Parquet, Python
Affects Versions: 9.0.0
Reporter: Gianluca Ficarelli


A dataframe with a MultiIndex built in this way:
{code:java}
import pandas as pd
df1 = pd.DataFrame({"a": [10, 11, 12], "b": [20, 21, 22]}, 
index=pd.RangeIndex(3, name="idx0"))
df1 = df1.set_index("b", append=True)
print(df1)
print(df1.index.get_level_values("idx0")) {code}
gives with Pandas 1.5.0:
{code:java}
          a
idx0 b     
0    20  10
1    21  11
2    22  12

RangeIndex(start=0, stop=3, step=1, name='idx0'){code}
while with Pandas 1.4.4:
{code:java}
          a
idx0 b     
0    20  10
1    21  11
2    22  12

Int64Index([0, 1, 2], dtype='int64', name='idx0'){code}
i.e. the result is RangeIndex instead of Int64Index.

With pandas 1.5.0 and pyarrow 9.0.0, writing this DataFrame with index=None 
(i.e. the default value) as in:
{code:java}
df1.to_parquet(path, engine="pyarrow", index=None) {code}
then reading the same file with:
{code:java}
pd.read_parquet(path, engine="pyarrow") {code}
raises an exception:
{code:java}
 File //lib/python3.9/site-packages/pyarrow/pandas_compat.py:997, in 
_extract_index_level(table, result_table, field_name, field_name_to_metadata)
    995 def _extract_index_level(table, result_table, field_name,
    996                          field_name_to_metadata):
--> 997     logical_name = field_name_to_metadata[field_name]['name']
    998     index_name = _backwards_compatible_index_name(field_name, 
logical_name)
    999     i = table.schema.get_field_index(field_name)

KeyError: 'b'
{code}
while with pandas 1.4.4 and pyarrow 9.0.0 it works correctly. 

Note that the problem disappears if the parquet file is written with index=True 
(that is not the default value), probably because the RangeIndex is converted 
to Int64Index:
{code:java}
df1.to_parquet(path, engine="pyarrow", index=True)  {code}
I suspect that the issue is caused by the change from Int64Index to RangeIndex 
and it may be related to [https://github.com/pandas-dev/pandas/issues/46675]

Should pyarrow be able to handle this case? Or is it an issue with Pandas?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17805) [C++][CI] Use Brew installed clang for MacOS

2022-09-21 Thread Jin Shang (Jira)
Jin Shang created ARROW-17805:
-

 Summary: [C++][CI] Use Brew installed clang for MacOS
 Key: ARROW-17805
 URL: https://issues.apache.org/jira/browse/ARROW-17805
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 9.0.0
Reporter: Jin Shang
 Fix For: 10.0.0


Also needs to solve compatibility issues with clang-15



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17804) [Go][CSV] Add Date32 and Time32 parsers

2022-09-21 Thread Matthew Topol (Jira)
Matthew Topol created ARROW-17804:
-

 Summary: [Go][CSV] Add Date32 and Time32 parsers
 Key: ARROW-17804
 URL: https://issues.apache.org/jira/browse/ARROW-17804
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Matthew Topol
Assignee: Matthew Topol






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17803) [C++] Use [[nodiscard]]

2022-09-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17803:
--

 Summary: [C++] Use [[nodiscard]]
 Key: ARROW-17803
 URL: https://issues.apache.org/jira/browse/ARROW-17803
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 10.0.0


We currently have a {{ARROW_MUST_USE_TYPE}} macro that's only enabled on 
clang-based builds.
Instead we can use the {{[[nodiscard]]}} that's standard in C++17.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17802) Merging multi file datasets on particular columns that are present in all the datasets.

2022-09-21 Thread N Gautam Animesh (Jira)
N Gautam Animesh created ARROW-17802:


 Summary: Merging multi file datasets on particular columns that 
are present in all the datasets.
 Key: ARROW-17802
 URL: https://issues.apache.org/jira/browse/ARROW-17802
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: N Gautam Animesh


While working with multi file datasets, I came across an issue where I wanted 
to merge specific columns from all the datasets and work on them.
Though I was not able to do so, I want to know whether there is any work around 
for merging multi file datasets around some specific columns?
Please look into it and do let me know if there's anything regarding this.
{code:java}
system.time({
  df <- open_dataset('C:/Test/Files/test', format = "arrow")
  df <- df %>% collect() %>%
  #merging logic so as to select only specified column(s)
  #write_dataset(df, 'C:/Test/Files/test', format = "arrow")
}) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17801) [Doc][Java] Fix typos in slice page in Cookbook

2022-09-21 Thread Larry White (Jira)
Larry White created ARROW-17801:
---

 Summary: [Doc][Java] Fix typos in slice page in Cookbook 
 Key: ARROW-17801
 URL: https://issues.apache.org/jira/browse/ARROW-17801
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 9.0.0
Reporter: Larry White
Assignee: Larry White


The slice instructions say "splice" in a couple of places. 

Check for other typos 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17800) [C++] Failure in jemalloc stats tests

2022-09-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17800:
--

 Summary: [C++] Failure in jemalloc stats tests
 Key: ARROW-17800
 URL: https://issues.apache.org/jira/browse/ARROW-17800
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 10.0.0


I just got this when running the tests locally:
{code}
[--] 2 tests from Jemalloc
[ RUN  ] Jemalloc.SetDirtyPageDecayMillis
[   OK ] Jemalloc.SetDirtyPageDecayMillis (0 ms)
[ RUN  ] Jemalloc.GetAllocationStats
/home/antoine/arrow/dev/cpp/src/arrow/memory_pool_test.cc:218: Failure
The difference between metadata0 and 300 is 2962256, which exceeds 100, 
where
metadata0 evaluates to 5962256,
300 evaluates to 300, and
100 evaluates to 100.
[  FAILED  ] Jemalloc.GetAllocationStats (0 ms)
[--] 2 tests from Jemalloc (0 ms total)
{code}

It looks like those checks should be relaxed to allow for more 
context-dependent behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17799) [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer

2022-09-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17799:
--

 Summary: [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to 
Parquet writer
 Key: ARROW-17799
 URL: https://issues.apache.org/jira/browse/ARROW-17799
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


We need to add DELTA_LENGTH_BYTE_ARRAY encoder to implement DELTA_BYTE_ARRAY 
encoder (ARROW-17619).
ARROW-13388 already implemented DELTA_LENGTH_BYTE_ARRAY decoder.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17798) [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer

2022-09-21 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-17798:
--

 Summary: [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet 
writer
 Key: ARROW-17798
 URL: https://issues.apache.org/jira/browse/ARROW-17798
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Parquet
Reporter: Rok Mihevc


We need to add DELTA_BINARY_PACKED encoder to implement DELTA_BYTE_ARRAY 
encoder (ARROW-17619).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17797) [Java] Remove deprecated methods from Java dataset module in Arrow 11

2022-09-21 Thread David Li (Jira)
David Li created ARROW-17797:


 Summary: [Java] Remove deprecated methods from Java dataset module 
in Arrow 11
 Key: ARROW-17797
 URL: https://issues.apache.org/jira/browse/ARROW-17797
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: David Li


ARROW-15745 deprecated some things in the Dataset module which should be 
removed for Arrow >= 11



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17796) Using cbind when merging multi datasets using open_dataset on a directory.

2022-09-21 Thread N Gautam Animesh (Jira)
N Gautam Animesh created ARROW-17796:


 Summary: Using cbind when merging multi datasets using 
open_dataset on a directory.
 Key: ARROW-17796
 URL: https://issues.apache.org/jira/browse/ARROW-17796
 Project: Apache Arrow
  Issue Type: Task
Reporter: N Gautam Animesh


I was wondering if we can use cbind stating particular column names when 
merging multi datasets using open_dataset(), so that we can bind only those 
particular cols.

I was using open_dataset to read multi datasets in a particular directory and 
wanted to merge  these multi datasets based on some particular columns that are 
common to all the datasets.

Is it possible to merge these datasets column wise, since by default 
open_dataset is merging all the datasets one after the other row-wise?

Do let me know if there's anything like this or any other work around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17795) [C++][R] Using ARROW_ZSTD_USE_SHARED fails

2022-09-21 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-17795:
--

 Summary: [C++][R] Using ARROW_ZSTD_USE_SHARED fails
 Key: ARROW-17795
 URL: https://issues.apache.org/jira/browse/ARROW-17795
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, R
Reporter: Jacob Wujciak-Jens
 Fix For: 10.0.0


See zulip discussion 
[here|https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/zstd.20cmake.20changes]

Changes to the find zstd module cause failure when  ARROW_ZSTD_USE_SHARED is 
used



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17794) [Java] Force delete jni lib on JVM exit

2022-09-21 Thread Jackey Lee (Jira)
Jackey Lee created ARROW-17794:
--

 Summary: [Java] Force delete jni lib on JVM exit
 Key: ARROW-17794
 URL: https://issues.apache.org/jira/browse/ARROW-17794
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 10.0.0
Reporter: Jackey Lee


Use `FileUtils.forceDeleteOnExit` to delete jni lib file on JVM exit. 
`FileUtils.forceDeleteOnExit` actually add a shut down hook to make sure file 
delte.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17793) [C++] Adding Union Relation ToProto

2022-09-21 Thread Vibhatha Lakmal Abeykoon (Jira)
Vibhatha Lakmal Abeykoon created ARROW-17793:


 Summary: [C++] Adding Union Relation ToProto
 Key: ARROW-17793
 URL: https://issues.apache.org/jira/browse/ARROW-17793
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Vibhatha Lakmal Abeykoon
Assignee: Vibhatha Lakmal Abeykoon


Union relation also require a Arrow->Substrait converter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17792) [C++] Use lambda capture move construction

2022-09-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-17792:
--

 Summary: [C++] Use lambda capture move construction
 Key: ARROW-17792
 URL: https://issues.apache.org/jira/browse/ARROW-17792
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou


C++17 allows us to move- construct captured lambda variables, while we had to 
write functors before.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17791) [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket

2022-09-21 Thread Jira
Raúl Cumplido created ARROW-17791:
-

 Summary: [Python][CI] Some nightly jobs are failing due to 
ACCESS_DENIED to S3 bucket
 Key: ARROW-17791
 URL: https://issues.apache.org/jira/browse/ARROW-17791
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, Python
Reporter: Raúl Cumplido


The following nitghly failures:
 * 
[test-conda-python-3.10|https://github.com/ursacomputing/crossbow/actions/runs/3094438413/jobs/5007812721]
 * 
[test-conda-python-3.7|https://github.com/ursacomputing/crossbow/actions/runs/3094412849/jobs/5007760110]
 * 
[test-conda-python-3.7-pandas-0.24|https://github.com/ursacomputing/crossbow/actions/runs/3094422644/jobs/5007779545]
 * 
[test-conda-python-3.7-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094419759/jobs/5007773935]
 * 
[test-conda-python-3.8|https://github.com/ursacomputing/crossbow/actions/runs/309904/jobs/5007827002]
 * 
[test-conda-python-3.8-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094405494/jobs/5007746062]
 * 
[test-conda-python-3.8-pandas-nightly|https://github.com/ursacomputing/crossbow/actions/runs/3094407475/jobs/5007750212]
 * 
[test-conda-python-3.9|https://github.com/ursacomputing/crossbow/actions/runs/3094450745/jobs/5007839959]
 * 
[test-conda-python-3.9-pandas-master|https://github.com/ursacomputing/crossbow/actions/runs/3094401032/jobs/5007736715]
 * 
[test-debian-11-python-3|https://github.com/ursacomputing/crossbow/runs/8465194776]

Failed Python test_s3_real_aws_region_selection with ACCESS_DENIED:
{code:java}
 === FAILURES 
===
__ test_s3_real_aws_region_selection 
___    @pytest.mark.s3
    def test_s3_real_aws_region_selection():
        # Taken from a registry of open S3-hosted datasets
        # at https://github.com/awslabs/open-data-registry
        fs, path = FileSystem.from_uri('s3://mf-nwp-models/README.txt')
        assert fs.region == 'eu-west-1'
>       with fs.open_input_stream(path) as 
> f:opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_fs.py:1660:
>  
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pyarrow/_fs.pyx:805: in pyarrow._fs.FileSystem.open_input_stream
    ???
pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   OSError: When reading information for key 'README.txt' in bucket 
'mf-nwp-models': AWS Error ACCESS_DENIED during HeadObject operation: No 
response body.pyarrow/error.pxi:115: OSError {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17790) [C++][Gandiva] Adapt to LLVM opaque pointer

2022-09-21 Thread Jin Shang (Jira)
Jin Shang created ARROW-17790:
-

 Summary: [C++][Gandiva] Adapt to LLVM opaque pointer
 Key: ARROW-17790
 URL: https://issues.apache.org/jira/browse/ARROW-17790
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva
Affects Versions: 9.0.0
Reporter: Jin Shang
 Fix For: 10.0.0


Starting from LLVM 13, LLVM IR has been shifting towards a unified opaque 
pointer type, i.e. pointers without pointee types. It has provided workarounds 
until LLVM 15. The temporary workarounds need to be replaced in order to 
support LLVM 15 and onwards.

For more background info, see [https://llvm.org/docs/OpaquePointers.html] and 
[https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)