[jira] [Created] (ARROW-17813) [Python] Nested ExtensionArray conversion to/from pandas/numpy
Chang She created ARROW-17813: - Summary: [Python] Nested ExtensionArray conversion to/from pandas/numpy Key: ARROW-17813 URL: https://issues.apache.org/jira/browse/ARROW-17813 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 9.0.0 Reporter: Chang She user@ thread: [https://lists.apache.org/thread/dhnxq0g4kgdysjowftfv3z5ngj780xpb] repro gist: [https://gist.github.com/changhiskhan/4163f8cec675a2418a69ec9168d5fdd9] *Arrow => numpy/pandas* For a non-nested array, pa.ExtensionArray.to_numpy automatically "lowers" to the storage type (as expected). However this is not done for nested arrays: {code:python} import pyarrow as pa class LabelType(pa.ExtensionType): def __init__(self): super(LabelType, self).__init__(pa.string(), "label") def __arrow_ext_serialize__(self): return b"" @classmethod def __arrow_ext_deserialize__(cls, storage_type, serialized): return LabelType() storage = pa.array(["dog", "cat", "horse"]) ext_arr = pa.ExtensionArray.from_storage(LabelType(), storage) offsets = pa.array([0, 1]) list_arr = pa.ListArray.from_arrays(offsets, ext_arr) list_arr.to_numpy() {code} {code:java} --- ArrowNotImplementedError Traceback (most recent call last) Cell In [15], line 1 > 1 list_arr.to_numpy() File /mnt/lance/.venv/lance/lib/python3.10/site-packages/pyarrow/array.pxi:1445, in pyarrow.lib.Array.to_numpy() File /mnt/lance/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status() ArrowNotImplementedError: Not implemented type for Arrow list to pandas: extension> {code} As mentioned on the user thread linked from the top, a fairly generic solution would just have the conversion default to the storage array's to_numpy. *pandas/numpy => Arrow* Equivalently, conversion to Arrow is also difficult for nested extension types: if I have say a pandas DataFrame that has a column of list-of-string and I want to convert that to list-of-label Array. Currently I have to: 1. Convert to list-of-string (storage) numpy array to pa.list_(pa.string()) 2. Convert the string values array to ExtensionArray, then reconstitue a list array using the ExtensionArray combined with the offsets from the result of step 1 {code:python} import pyarrow as pa import pandas as pd df = pd.DataFrame({'labels': [["dog", "horse", "cat"], ["person", "person", "car", "car"]]}) list_of_storage = pa.array(df.labels) ext_values = pa.ExtensionArray.from_storage(LabelType(), list_of_storage.values) list_of_ext = pa.ListArray.from_arrays(offsets=list_of_storage.offsets, values=ext_values) {code} For non-nested columns, one can achieve easier conversion by defining a pandas extension dtype, but i don't think that works for a nested column. You would instead have to fallback to something like `pa.ExtensionArray.from_storage` (or `from_pandas`?) to do the trick. Even that doesn't necessarily work for something like a dictionary column because you'd have to pass in the dictionary somehow. Off the cuff, one could provide a custom lambda to `pa.Table.from_pandas` that is used for either specified column names / data types? Thanks in advance for the consideration! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17812) [C++][Documentation] Add Gandiva User Guide
Will Jones created ARROW-17812: -- Summary: [C++][Documentation] Add Gandiva User Guide Key: ARROW-17812 URL: https://issues.apache.org/jira/browse/ARROW-17812 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva Reporter: Will Jones Assignee: Will Jones Fix For: 10.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17811) [Doc][Java] Document how dictionary encoding works
Larry White created ARROW-17811: --- Summary: [Doc][Java] Document how dictionary encoding works Key: ARROW-17811 URL: https://issues.apache.org/jira/browse/ARROW-17811 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Java Affects Versions: 9.0.0 Reporter: Larry White The ValueVector documentation does not include any discussion of dictionary encoding. There is example code on the IPC page https://arrow.apache.org/docs/dev/java/ipc.html, but it doesn't provide an overview. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17810) [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI
David Li created ARROW-17810: Summary: [Java] Update JaCoCo to 0.8.8 for Java 18 support in CI Key: ARROW-17810 URL: https://issues.apache.org/jira/browse/ARROW-17810 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li Assignee: David Li Not sure why this didn't fail before, but we need to bump JaCoCo for Java 18 to work: {noformat} java.lang.instrument.IllegalClassFormatException: Error while instrumenting org/apache/calcite/avatica/AvaticaConnection$MockitoMock$854659140$auxiliary$kA4H37GT. at org.jacoco.agent.rt.internal_3570298.CoverageTransformer.transform(CoverageTransformer.java:94) at java.instrument/java.lang.instrument.ClassFileTransformer.transform(ClassFileTransformer.java:244) at java.instrument/sun.instrument.TransformerManager.transform(TransformerManager.java:188) at java.instrument/sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:541) at java.base/java.lang.ClassLoader.defineClass1(Native Method) at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1013) at java.base/java.lang.ClassLoader$ByteBuddyAccessor$PXg8JwS3.defineClass(Unknown Source) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:577) at net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection$Dispatcher$UsingUnsafeInjection.defineClass(ClassInjector.java:1027) at net.bytebuddy.dynamic.loading.ClassInjector$UsingReflection.injectRaw(ClassInjector.java:279) at net.bytebuddy.dynamic.loading.ClassInjector$AbstractBase.inject(ClassInjector.java:114) at net.bytebuddy.dynamic.loading.ClassLoadingStrategy$Default$InjectionDispatcher.load(ClassLoadingStrategy.java:233) at net.bytebuddy.dynamic.TypeResolutionStrategy$Passive.initialize(TypeResolutionStrategy.java:100) at net.bytebuddy.dynamic.DynamicType$Default$Unloaded.load(DynamicType.java:6154) at org.mockito.internal.creation.bytebuddy.SubclassBytecodeGenerator.mockClass(SubclassBytecodeGenerator.java:268) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40) at org.mockito.internal.creation.bytebuddy.InlineBytecodeGenerator.mockClass(InlineBytecodeGenerator.java:216) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.lambda$mockClass$0(TypeCachingBytecodeGenerator.java:47) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:153) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:366) at net.bytebuddy.TypeCache.findOrInsert(TypeCache.java:175) at net.bytebuddy.TypeCache$WithInlineExpunction.findOrInsert(TypeCache.java:377) at org.mockito.internal.creation.bytebuddy.TypeCachingBytecodeGenerator.mockClass(TypeCachingBytecodeGenerator.java:40) at org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMockType(InlineDelegateByteBuddyMockMaker.java:391) at org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.doCreateMock(InlineDelegateByteBuddyMockMaker.java:351) at org.mockito.internal.creation.bytebuddy.InlineDelegateByteBuddyMockMaker.createMock(InlineDelegateByteBuddyMockMaker.java:330) at org.mockito.internal.creation.bytebuddy.InlineByteBuddyMockMaker.createMock(InlineByteBuddyMockMaker.java:58) at org.mockito.internal.util.MockUtil.createMock(MockUtil.java:53) at org.mockito.internal.MockitoCore.mock(MockitoCore.java:84) at org.mockito.Mockito.mock(Mockito.java:1964) at org.mockito.internal.configuration.MockAnnotationProcessor.processAnnotationForMock(MockAnnotationProcessor.java:66) at org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:27) at org.mockito.internal.configuration.MockAnnotationProcessor.process(MockAnnotationProcessor.java:24) at org.mockito.internal.configuration.IndependentAnnotationEngine.createMockFor(IndependentAnnotationEngine.java:45) at org.mockito.internal.configuration.IndependentAnnotationEngine.process(IndependentAnnotationEngine.java:73) at
[jira] [Created] (ARROW-17809) [R] DuckDB test is failing (again) with new duckdb release
Dewey Dunnington created ARROW-17809: Summary: [R] DuckDB test is failing (again) with new duckdb release Key: ARROW-17809 URL: https://issues.apache.org/jira/browse/ARROW-17809 Project: Apache Arrow Issue Type: Bug Components: R Reporter: Dewey Dunnington It looks like the fix that I thought would work in DuckDB did not, in fact fix the error! The previous ticket, ARROW-17643, just skipped the test until the new release of duckdb (which just happened). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17808) [C#] FixedSizeList implementation is missing
helmi created ARROW-17808: - Summary: [C#] FixedSizeList implementation is missing Key: ARROW-17808 URL: https://issues.apache.org/jira/browse/ARROW-17808 Project: Apache Arrow Issue Type: Improvement Components: C# Affects Versions: 9.0.0 Reporter: helmi Hi, I'm working toward integrating apache arrow c# and I find out that FixedSizeList is not implemented. Is there a plan to implement the missing type ? Otherwise, what are my options to overcome this issue ? https://issues.apache.org/jira/browse/ARROW-17644 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17807) [C++] Regenerate Flatbuffers files for C++17
Antoine Pitrou created ARROW-17807: -- Summary: [C++] Regenerate Flatbuffers files for C++17 Key: ARROW-17807 URL: https://issues.apache.org/jira/browse/ARROW-17807 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou Fix For: 10.0.0 We should enable C++17 features in the generated Flatbuffers sources. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17806) pyarrow fails to write and read a dataframe with MultiIndex containing a RangeIndex with Pandas 1.5.0
Gianluca Ficarelli created ARROW-17806: -- Summary: pyarrow fails to write and read a dataframe with MultiIndex containing a RangeIndex with Pandas 1.5.0 Key: ARROW-17806 URL: https://issues.apache.org/jira/browse/ARROW-17806 Project: Apache Arrow Issue Type: Bug Components: Parquet, Python Affects Versions: 9.0.0 Reporter: Gianluca Ficarelli A dataframe with a MultiIndex built in this way: {code:java} import pandas as pd df1 = pd.DataFrame({"a": [10, 11, 12], "b": [20, 21, 22]}, index=pd.RangeIndex(3, name="idx0")) df1 = df1.set_index("b", append=True) print(df1) print(df1.index.get_level_values("idx0")) {code} gives with Pandas 1.5.0: {code:java} a idx0 b 0 20 10 1 21 11 2 22 12 RangeIndex(start=0, stop=3, step=1, name='idx0'){code} while with Pandas 1.4.4: {code:java} a idx0 b 0 20 10 1 21 11 2 22 12 Int64Index([0, 1, 2], dtype='int64', name='idx0'){code} i.e. the result is RangeIndex instead of Int64Index. With pandas 1.5.0 and pyarrow 9.0.0, writing this DataFrame with index=None (i.e. the default value) as in: {code:java} df1.to_parquet(path, engine="pyarrow", index=None) {code} then reading the same file with: {code:java} pd.read_parquet(path, engine="pyarrow") {code} raises an exception: {code:java} File //lib/python3.9/site-packages/pyarrow/pandas_compat.py:997, in _extract_index_level(table, result_table, field_name, field_name_to_metadata) 995 def _extract_index_level(table, result_table, field_name, 996 field_name_to_metadata): --> 997 logical_name = field_name_to_metadata[field_name]['name'] 998 index_name = _backwards_compatible_index_name(field_name, logical_name) 999 i = table.schema.get_field_index(field_name) KeyError: 'b' {code} while with pandas 1.4.4 and pyarrow 9.0.0 it works correctly. Note that the problem disappears if the parquet file is written with index=True (that is not the default value), probably because the RangeIndex is converted to Int64Index: {code:java} df1.to_parquet(path, engine="pyarrow", index=True) {code} I suspect that the issue is caused by the change from Int64Index to RangeIndex and it may be related to [https://github.com/pandas-dev/pandas/issues/46675] Should pyarrow be able to handle this case? Or is it an issue with Pandas? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17805) [C++][CI] Use Brew installed clang for MacOS
Jin Shang created ARROW-17805: - Summary: [C++][CI] Use Brew installed clang for MacOS Key: ARROW-17805 URL: https://issues.apache.org/jira/browse/ARROW-17805 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 9.0.0 Reporter: Jin Shang Fix For: 10.0.0 Also needs to solve compatibility issues with clang-15 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17804) [Go][CSV] Add Date32 and Time32 parsers
Matthew Topol created ARROW-17804: - Summary: [Go][CSV] Add Date32 and Time32 parsers Key: ARROW-17804 URL: https://issues.apache.org/jira/browse/ARROW-17804 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Matthew Topol Assignee: Matthew Topol -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17803) [C++] Use [[nodiscard]]
Antoine Pitrou created ARROW-17803: -- Summary: [C++] Use [[nodiscard]] Key: ARROW-17803 URL: https://issues.apache.org/jira/browse/ARROW-17803 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou Assignee: Antoine Pitrou Fix For: 10.0.0 We currently have a {{ARROW_MUST_USE_TYPE}} macro that's only enabled on clang-based builds. Instead we can use the {{[[nodiscard]]}} that's standard in C++17. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17802) Merging multi file datasets on particular columns that are present in all the datasets.
N Gautam Animesh created ARROW-17802: Summary: Merging multi file datasets on particular columns that are present in all the datasets. Key: ARROW-17802 URL: https://issues.apache.org/jira/browse/ARROW-17802 Project: Apache Arrow Issue Type: Improvement Reporter: N Gautam Animesh While working with multi file datasets, I came across an issue where I wanted to merge specific columns from all the datasets and work on them. Though I was not able to do so, I want to know whether there is any work around for merging multi file datasets around some specific columns? Please look into it and do let me know if there's anything regarding this. {code:java} system.time({ df <- open_dataset('C:/Test/Files/test', format = "arrow") df <- df %>% collect() %>% #merging logic so as to select only specified column(s) #write_dataset(df, 'C:/Test/Files/test', format = "arrow") }) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17801) [Doc][Java] Fix typos in slice page in Cookbook
Larry White created ARROW-17801: --- Summary: [Doc][Java] Fix typos in slice page in Cookbook Key: ARROW-17801 URL: https://issues.apache.org/jira/browse/ARROW-17801 Project: Apache Arrow Issue Type: Improvement Components: Java Affects Versions: 9.0.0 Reporter: Larry White Assignee: Larry White The slice instructions say "splice" in a couple of places. Check for other typos -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17800) [C++] Failure in jemalloc stats tests
Antoine Pitrou created ARROW-17800: -- Summary: [C++] Failure in jemalloc stats tests Key: ARROW-17800 URL: https://issues.apache.org/jira/browse/ARROW-17800 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Fix For: 10.0.0 I just got this when running the tests locally: {code} [--] 2 tests from Jemalloc [ RUN ] Jemalloc.SetDirtyPageDecayMillis [ OK ] Jemalloc.SetDirtyPageDecayMillis (0 ms) [ RUN ] Jemalloc.GetAllocationStats /home/antoine/arrow/dev/cpp/src/arrow/memory_pool_test.cc:218: Failure The difference between metadata0 and 300 is 2962256, which exceeds 100, where metadata0 evaluates to 5962256, 300 evaluates to 300, and 100 evaluates to 100. [ FAILED ] Jemalloc.GetAllocationStats (0 ms) [--] 2 tests from Jemalloc (0 ms total) {code} It looks like those checks should be relaxed to allow for more context-dependent behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17799) [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer
Rok Mihevc created ARROW-17799: -- Summary: [C++][Parquet] Add DELTA_LENGTH_BYTE_ARRAY encoder to Parquet writer Key: ARROW-17799 URL: https://issues.apache.org/jira/browse/ARROW-17799 Project: Apache Arrow Issue Type: New Feature Components: C++, Parquet Reporter: Rok Mihevc We need to add DELTA_LENGTH_BYTE_ARRAY encoder to implement DELTA_BYTE_ARRAY encoder (ARROW-17619). ARROW-13388 already implemented DELTA_LENGTH_BYTE_ARRAY decoder. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17798) [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer
Rok Mihevc created ARROW-17798: -- Summary: [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer Key: ARROW-17798 URL: https://issues.apache.org/jira/browse/ARROW-17798 Project: Apache Arrow Issue Type: New Feature Components: C++, Parquet Reporter: Rok Mihevc We need to add DELTA_BINARY_PACKED encoder to implement DELTA_BYTE_ARRAY encoder (ARROW-17619). -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17797) [Java] Remove deprecated methods from Java dataset module in Arrow 11
David Li created ARROW-17797: Summary: [Java] Remove deprecated methods from Java dataset module in Arrow 11 Key: ARROW-17797 URL: https://issues.apache.org/jira/browse/ARROW-17797 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: David Li ARROW-15745 deprecated some things in the Dataset module which should be removed for Arrow >= 11 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17796) Using cbind when merging multi datasets using open_dataset on a directory.
N Gautam Animesh created ARROW-17796: Summary: Using cbind when merging multi datasets using open_dataset on a directory. Key: ARROW-17796 URL: https://issues.apache.org/jira/browse/ARROW-17796 Project: Apache Arrow Issue Type: Task Reporter: N Gautam Animesh I was wondering if we can use cbind stating particular column names when merging multi datasets using open_dataset(), so that we can bind only those particular cols. I was using open_dataset to read multi datasets in a particular directory and wanted to merge these multi datasets based on some particular columns that are common to all the datasets. Is it possible to merge these datasets column wise, since by default open_dataset is merging all the datasets one after the other row-wise? Do let me know if there's anything like this or any other work around. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17795) [C++][R] Using ARROW_ZSTD_USE_SHARED fails
Jacob Wujciak-Jens created ARROW-17795: -- Summary: [C++][R] Using ARROW_ZSTD_USE_SHARED fails Key: ARROW-17795 URL: https://issues.apache.org/jira/browse/ARROW-17795 Project: Apache Arrow Issue Type: Improvement Components: C++, R Reporter: Jacob Wujciak-Jens Fix For: 10.0.0 See zulip discussion [here|https://ursalabs.zulipchat.com/#narrow/stream/180245-dev/topic/zstd.20cmake.20changes] Changes to the find zstd module cause failure when ARROW_ZSTD_USE_SHARED is used -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17794) [Java] Force delete jni lib on JVM exit
Jackey Lee created ARROW-17794: -- Summary: [Java] Force delete jni lib on JVM exit Key: ARROW-17794 URL: https://issues.apache.org/jira/browse/ARROW-17794 Project: Apache Arrow Issue Type: Improvement Components: Java Affects Versions: 10.0.0 Reporter: Jackey Lee Use `FileUtils.forceDeleteOnExit` to delete jni lib file on JVM exit. `FileUtils.forceDeleteOnExit` actually add a shut down hook to make sure file delte. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17793) [C++] Adding Union Relation ToProto
Vibhatha Lakmal Abeykoon created ARROW-17793: Summary: [C++] Adding Union Relation ToProto Key: ARROW-17793 URL: https://issues.apache.org/jira/browse/ARROW-17793 Project: Apache Arrow Issue Type: Sub-task Reporter: Vibhatha Lakmal Abeykoon Assignee: Vibhatha Lakmal Abeykoon Union relation also require a Arrow->Substrait converter. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17792) [C++] Use lambda capture move construction
Antoine Pitrou created ARROW-17792: -- Summary: [C++] Use lambda capture move construction Key: ARROW-17792 URL: https://issues.apache.org/jira/browse/ARROW-17792 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Antoine Pitrou C++17 allows us to move- construct captured lambda variables, while we had to write functors before. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17791) [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket
Raúl Cumplido created ARROW-17791: - Summary: [Python][CI] Some nightly jobs are failing due to ACCESS_DENIED to S3 bucket Key: ARROW-17791 URL: https://issues.apache.org/jira/browse/ARROW-17791 Project: Apache Arrow Issue Type: Bug Components: Continuous Integration, Python Reporter: Raúl Cumplido The following nitghly failures: * [test-conda-python-3.10|https://github.com/ursacomputing/crossbow/actions/runs/3094438413/jobs/5007812721] * [test-conda-python-3.7|https://github.com/ursacomputing/crossbow/actions/runs/3094412849/jobs/5007760110] * [test-conda-python-3.7-pandas-0.24|https://github.com/ursacomputing/crossbow/actions/runs/3094422644/jobs/5007779545] * [test-conda-python-3.7-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094419759/jobs/5007773935] * [test-conda-python-3.8|https://github.com/ursacomputing/crossbow/actions/runs/309904/jobs/5007827002] * [test-conda-python-3.8-pandas-latest|https://github.com/ursacomputing/crossbow/actions/runs/3094405494/jobs/5007746062] * [test-conda-python-3.8-pandas-nightly|https://github.com/ursacomputing/crossbow/actions/runs/3094407475/jobs/5007750212] * [test-conda-python-3.9|https://github.com/ursacomputing/crossbow/actions/runs/3094450745/jobs/5007839959] * [test-conda-python-3.9-pandas-master|https://github.com/ursacomputing/crossbow/actions/runs/3094401032/jobs/5007736715] * [test-debian-11-python-3|https://github.com/ursacomputing/crossbow/runs/8465194776] Failed Python test_s3_real_aws_region_selection with ACCESS_DENIED: {code:java} === FAILURES === __ test_s3_real_aws_region_selection ___ @pytest.mark.s3 def test_s3_real_aws_region_selection(): # Taken from a registry of open S3-hosted datasets # at https://github.com/awslabs/open-data-registry fs, path = FileSystem.from_uri('s3://mf-nwp-models/README.txt') assert fs.region == 'eu-west-1' > with fs.open_input_stream(path) as > f:opt/conda/envs/arrow/lib/python3.10/site-packages/pyarrow/tests/test_fs.py:1660: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pyarrow/_fs.pyx:805: in pyarrow._fs.FileSystem.open_input_stream ??? pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E OSError: When reading information for key 'README.txt' in bucket 'mf-nwp-models': AWS Error ACCESS_DENIED during HeadObject operation: No response body.pyarrow/error.pxi:115: OSError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17790) [C++][Gandiva] Adapt to LLVM opaque pointer
Jin Shang created ARROW-17790: - Summary: [C++][Gandiva] Adapt to LLVM opaque pointer Key: ARROW-17790 URL: https://issues.apache.org/jira/browse/ARROW-17790 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva Affects Versions: 9.0.0 Reporter: Jin Shang Fix For: 10.0.0 Starting from LLVM 13, LLVM IR has been shifting towards a unified opaque pointer type, i.e. pointers without pointee types. It has provided workarounds until LLVM 15. The temporary workarounds need to be replaced in order to support LLVM 15 and onwards. For more background info, see [https://llvm.org/docs/OpaquePointers.html] and [https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)