[jira] [Created] (ARROW-15562) [Java] FuzzIpcStream: Uncaught exception in java.base/java.nio.Bits.reserveMemory
Liya Fan created ARROW-15562: Summary: [Java] FuzzIpcStream: Uncaught exception in java.base/java.nio.Bits.reserveMemory Key: ARROW-15562 URL: https://issues.apache.org/jira/browse/ARROW-15562 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Liya Fan Detailed Report: [https://oss-fuzz.com/testcase?key=4599882936090624] Project: arrow-java Fuzzing Engine: libFuzzer Fuzz Target: FuzzIpcStream Job Type: libfuzzer_asan_arrow-java Platform Id: linux Crash Type: Uncaught exception Crash Address: Crash State: java.base/java.nio.Bits.reserveMemory java.base/java.nio.DirectByteBuffer. java.base/java.nio.ByteBuffer.allocateDirect Sanitizer: address (ASAN) Recommended Security Severity: Low Crash Revision: [https://oss-fuzz.com/revisions?job=libfuzzer_asan_arrow-java=202202010604] Reproducer Testcase: [https://oss-fuzz.com/download?testcase_id=4599882936090624] Issue filed automatically. See [https://google.github.io/oss-fuzz/advanced-topics/reproducing] for instructions to reproduce this bug locally. When you fix this bug, please * mention the fix revision(s). * state whether the bug was a short-lived regression or an old bug in any stable releases. * add any other useful information. This information can help downstream consumers. If you need to contact the OSS-Fuzz team with a question, concern, or any other feedback, please file an issue at [https://github.com/google/oss-fuzz/issues]. Comments on individual Monorail issues are not monitored. This bug is subject to a 90 day disclosure deadline. If 90 days elapse without an upstream patch, then the bug report will automatically become visible to the public.{color:#88} {color} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15561) [Java] FuzzIpcStream: Uncaught exception in org.apache.arrow.memory.BaseAllocator.buffer
Liya Fan created ARROW-15561: Summary: [Java] FuzzIpcStream: Uncaught exception in org.apache.arrow.memory.BaseAllocator.buffer Key: ARROW-15561 URL: https://issues.apache.org/jira/browse/ARROW-15561 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Liya Fan Detailed Report: https://oss-fuzz.com/testcase?key=6427573486223360 Project: arrow-java Fuzzing Engine: libFuzzer Fuzz Target: FuzzIpcStream Job Type: libfuzzer_asan_arrow-java Platform Id: linux Crash Type: Uncaught exception Crash Address: Crash State: org.apache.arrow.memory.BaseAllocator.buffer org.apache.arrow.memory.RootAllocator.buffer org.apache.arrow.memory.BaseAllocator.buffer Sanitizer: address (ASAN) Crash Revision: https://oss-fuzz.com/revisions?job=libfuzzer_asan_arrow-java=202201310606 Reproducer Testcase: https://oss-fuzz.com/download?testcase_id=6427573486223360 Issue filed automatically. See https://google.github.io/oss-fuzz/advanced-topics/reproducing for instructions to reproduce this bug locally. When you fix this bug, please * mention the fix revision(s). * state whether the bug was a short-lived regression or an old bug in any stable releases. * add any other useful information. This information can help downstream consumers. If you need to contact the OSS-Fuzz team with a question, concern, or any other feedback, please file an issue at https://github.com/google/oss-fuzz/issues. Comments on individual Monorail issues are not monitored. This bug is subject to a 90 day disclosure deadline. If 90 days elapse without an upstream patch, then the bug report will automatically become visible to the public. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15560) [Java] FuzzIpcStream: Uncaught exception in java.base/java.nio.Buffer.createCapacityException
Liya Fan created ARROW-15560: Summary: [Java] FuzzIpcStream: Uncaught exception in java.base/java.nio.Buffer.createCapacityException Key: ARROW-15560 URL: https://issues.apache.org/jira/browse/ARROW-15560 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Liya Fan Detailed Report: https://oss-fuzz.com/testcase?key=5095153130405888 Project: arrow-java Fuzzing Engine: libFuzzer Fuzz Target: FuzzIpcStream Job Type: libfuzzer_asan_arrow-java Platform Id: linux Crash Type: Uncaught exception Crash Address: Crash State: java.base/java.nio.Buffer.createCapacityException java.base/java.nio.ByteBuffer.allocate org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage Sanitizer: address (ASAN) Crash Revision: https://oss-fuzz.com/revisions?job=libfuzzer_asan_arrow-java=202201300605 Reproducer Testcase: https://oss-fuzz.com/download?testcase_id=5095153130405888 Issue filed automatically. See https://google.github.io/oss-fuzz/advanced-topics/reproducing for instructions to reproduce this bug locally. When you fix this bug, please * mention the fix revision(s). * state whether the bug was a short-lived regression or an old bug in any stable releases. * add any other useful information. This information can help downstream consumers. If you need to contact the OSS-Fuzz team with a question, concern, or any other feedback, please file an issue at https://github.com/google/oss-fuzz/issues. Comments on individual Monorail issues are not monitored. This bug is subject to a 90 day disclosure deadline. If 90 days elapse without an upstream patch, then the bug report will automatically become visible to the public. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15559) [Java] FuzzIpcFile: Uncaught exception in org.apache.arrow.vector.types.pojo.Schema.convertSchema
Liya Fan created ARROW-15559: Summary: [Java] FuzzIpcFile: Uncaught exception in org.apache.arrow.vector.types.pojo.Schema.convertSchema Key: ARROW-15559 URL: https://issues.apache.org/jira/browse/ARROW-15559 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Liya Fan Detailed Report: https://oss-fuzz.com/testcase?key=5965184743636992 Project: arrow-java Fuzzing Engine: libFuzzer Fuzz Target: FuzzIpcFile Job Type: libfuzzer_asan_arrow-java Platform Id: linux Crash Type: Uncaught exception Crash Address: Crash State: org.apache.arrow.vector.types.pojo.Schema.convertSchema org.apache.arrow.vector.ipc.message.ArrowFooter. org.apache.arrow.vector.ipc.ArrowFileReader.readSchema Sanitizer: address (ASAN) Crash Revision: https://oss-fuzz.com/revisions?job=libfuzzer_asan_arrow-java=202201300605 Reproducer Testcase: https://oss-fuzz.com/download?testcase_id=5965184743636992 Issue filed automatically. See https://google.github.io/oss-fuzz/advanced-topics/reproducing for instructions to reproduce this bug locally. When you fix this bug, please * mention the fix revision(s). * state whether the bug was a short-lived regression or an old bug in any stable releases. * add any other useful information. This information can help downstream consumers. If you need to contact the OSS-Fuzz team with a question, concern, or any other feedback, please file an issue at https://github.com/google/oss-fuzz/issues. Comments on individual Monorail issues are not monitored. This bug is subject to a 90 day disclosure deadline. If 90 days elapse without an upstream patch, then the bug report will automatically become visible to the public. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15558) [Java] FuzzIpcFile: Uncaught exception in java.base/java.nio.Buffer.checkIndex
Liya Fan created ARROW-15558: Summary: [Java] FuzzIpcFile: Uncaught exception in java.base/java.nio.Buffer.checkIndex Key: ARROW-15558 URL: https://issues.apache.org/jira/browse/ARROW-15558 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Liya Fan Detailed Report: https://oss-fuzz.com/testcase?key=5518211972464640 Project: arrow-java Fuzzing Engine: libFuzzer Fuzz Target: FuzzIpcFile Job Type: libfuzzer_asan_arrow-java Platform Id: linux Crash Type: Uncaught exception Crash Address: Crash State: java.base/java.nio.Buffer.checkIndex java.base/java.nio.HeapByteBuffer.getInt com.google.flatbuffers.Table.__reset Sanitizer: address (ASAN) Crash Revision: https://oss-fuzz.com/revisions?job=libfuzzer_asan_arrow-java=202201300605 Reproducer Testcase: https://oss-fuzz.com/download?testcase_id=5518211972464640 Issue filed automatically. See https://google.github.io/oss-fuzz/advanced-topics/reproducing for instructions to reproduce this bug locally. When you fix this bug, please * mention the fix revision(s). * state whether the bug was a short-lived regression or an old bug in any stable releases. * add any other useful information. This information can help downstream consumers. If you need to contact the OSS-Fuzz team with a question, concern, or any other feedback, please file an issue at https://github.com/google/oss-fuzz/issues. Comments on individual Monorail issues are not monitored. This bug is subject to a 90 day disclosure deadline. If 90 days elapse without an upstream patch, then the bug report will automatically become visible to the public. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15557) [Java] FuzzIpcFile: Uncaught exception in java.base/java.nio.HeapByteBuffer.
Liya Fan created ARROW-15557: Summary: [Java] FuzzIpcFile: Uncaught exception in java.base/java.nio.HeapByteBuffer. Key: ARROW-15557 URL: https://issues.apache.org/jira/browse/ARROW-15557 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Liya Fan Detailed Report: https://oss-fuzz.com/testcase?key=5015797066498048 Project: arrow-java Fuzzing Engine: libFuzzer Fuzz Target: FuzzIpcFile Job Type: libfuzzer_asan_arrow-java Platform Id: linux Crash Type: Uncaught exception Crash Address: Crash State: java.base/java.nio.HeapByteBuffer. java.base/java.nio.ByteBuffer.allocate org.apache.arrow.vector.ipc.ArrowFileReader.readSchema Sanitizer: address (ASAN) Recommended Security Severity: Low Crash Revision: https://oss-fuzz.com/revisions?job=libfuzzer_asan_arrow-java=202201300605 Reproducer Testcase: https://oss-fuzz.com/download?testcase_id=5015797066498048 Issue filed automatically. See https://google.github.io/oss-fuzz/advanced-topics/reproducing for instructions to reproduce this bug locally. When you fix this bug, please * mention the fix revision(s). * state whether the bug was a short-lived regression or an old bug in any stable releases. * add any other useful information. This information can help downstream consumers. If you need to contact the OSS-Fuzz team with a question, concern, or any other feedback, please file an issue at https://github.com/google/oss-fuzz/issues. Comments on individual Monorail issues are not monitored. This bug is subject to a 90 day disclosure deadline. If 90 days elapse without an upstream patch, then the bug report will automatically become visible to the public. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15556) [Release] Add a script to update Homebrew packages
Kouhei Sutou created ARROW-15556: Summary: [Release] Add a script to update Homebrew packages Key: ARROW-15556 URL: https://issues.apache.org/jira/browse/ARROW-15556 Project: Apache Arrow Issue Type: Improvement Components: Packaging Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15555) [Release] Post release version bumping script tries to push the release tag
Krisztian Szucs created ARROW-1: --- Summary: [Release] Post release version bumping script tries to push the release tag Key: ARROW-1 URL: https://issues.apache.org/jira/browse/ARROW-1 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Krisztian Szucs Fix For: 8.0.0 fatal: tag 'apache-arrow-7.0.0' already exists -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15554) [Format][C++] Add "LargeMap" type with 64-bit offsets
Sarah Gilmore created ARROW-15554: - Summary: [Format][C++] Add "LargeMap" type with 64-bit offsets Key: ARROW-15554 URL: https://issues.apache.org/jira/browse/ARROW-15554 Project: Apache Arrow Issue Type: Improvement Components: C++, Format Reporter: Sarah Gilmore It would be nice if a "LargeMap" type existed along side the "Map" type for parity. For other datatypes that require offset arrays/buffers, such as String, List, BinaryArray, provides a "large" version of these types, i.e. LargeString, LargeList, and LargeBinaryArray. It would be nice to have a "LargeMap" for parity. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15553) Possible bug with array.TableReader
Charlie Hubbard created ARROW-15553: --- Summary: Possible bug with array.TableReader Key: ARROW-15553 URL: https://issues.apache.org/jira/browse/ARROW-15553 Project: Apache Arrow Issue Type: Bug Components: Go Affects Versions: 6.0.1 Environment: ubuntu20.04, go1.16.13 Reporter: Charlie Hubbard Attachments: chunksize.go I'm writing an array of int32 to a byte buffer using an array.TableReader and ipc.Writer. The array is [1, 2, 3 (null), 4]. I then read it back from the byte buffer using an ipc.Reader. If the chunk size of the array.TableReader is smaller than the size of the array, the null is ignored. I read back `[1, 2,3, 4]`. Is this a bug in my code or apache arrow? Code attached. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15552) [Docs][Format] Unclear wording about base64 encoding requirement of metadata values
Joris Van den Bossche created ARROW-15552: - Summary: [Docs][Format] Unclear wording about base64 encoding requirement of metadata values Key: ARROW-15552 URL: https://issues.apache.org/jira/browse/ARROW-15552 Project: Apache Arrow Issue Type: Improvement Components: Documentation, Format Reporter: Joris Van den Bossche The C Data Interface docs indicate that the values in key-value metadata should be base64 encoded, which is mentioned in the section about which key-value metadata to use for extension types (https://arrow.apache.org/docs/format/CDataInterface.html#extension-arrays): bq. The base64 encoding of metadata values ensures that any possible serialization is representable. This might not be fully correct, though (or at least not required, which is implied with the current wording). While a binary blob (like a serialized schema) can be base64 encoded, as we do when putting the Arrow schema in the Parquet metadata, this is not required? cc [~apitrou] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15551) [C++][FlightRPC] gRPC appears to have broken TlsCredentialsOptions again
David Li created ARROW-15551: Summary: [C++][FlightRPC] gRPC appears to have broken TlsCredentialsOptions again Key: ARROW-15551 URL: https://issues.apache.org/jira/browse/ARROW-15551 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: David Li Assignee: David Li With gRPC 1.43.2 {noformat} -- Checking support for TlsCredentialsOptions (gRPC >= 1.36)... -- TlsCredentialsOptions (for gRPC 1.36) not found in grpc::experimental. -- Checking support for TlsCredentialsOptions (gRPC >= 1.34)... -- TlsCredentialsOptions (for gRPC 1.34) not found in grpc::experimental. -- Checking support for TlsCredentialsOptions (gRPC >= 1.32)... -- TlsCredentialsOptions (for gRPC 1.32) not found in grpc::experimental. -- Checking support for TlsCredentialsOptions (gRPC >= 1.27)... -- TlsCredentialsOptions (for gRPC 1.27) not found in grpc::experimental. -- Found approximate gRPC version: (ARROW_FLIGHT_REQUIRE_TLSCREDENTIALSOPTIONS=) -- A proper version of gRPC could not be found to support TlsCredentialsOptions in Arrow Flight. -- You may need a newer version of gRPC (>= 1.27), or the gRPC API has changed and Flight must be updated to match. {noformat} We need to update the detection again. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15550) [C++] Add an environment variable to debug memory
Antoine Pitrou created ARROW-15550: -- Summary: [C++] Add an environment variable to debug memory Key: ARROW-15550 URL: https://issues.apache.org/jira/browse/ARROW-15550 Project: Apache Arrow Issue Type: Wish Components: C++, Developer Tools Reporter: Antoine Pitrou It could be useful to allow enabling memory checks at runtime (e.g. check that memory was not written out of bounds when a MemoryPool deallocates a buffer). For example an environment variable setting {{ARROW_DEBUG_MEMORY_POOL=(warn|trap|abort)}} See https://github.com/apache/arrow/pull/12216 for inspiration -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15549) [Java] gRPC not available on M1
Rok Mihevc created ARROW-15549: -- Summary: [Java] gRPC not available on M1 Key: ARROW-15549 URL: https://issues.apache.org/jira/browse/ARROW-15549 Project: Apache Arrow Issue Type: New Feature Reporter: Rok Mihevc When building on M1 gRPC is not found. It can be [manually downloaded|https://repo1.maven.org/maven2/io/grpc/protoc-gen-grpc-java/1.41.0/protoc-gen-grpc-java-1.41.0-osx-x86_64.exe] and installed: {code:bash} mvn install:install-file -DgroupId=io.grpc -DartifactId=protoc-gen-grpc-java -Dversion=1.41.0 -Dclassifier=osx-aarch_64 -Dpackaging=exe -Dfile=/Users/rok/Downloads/protoc-gen-grpc-java-1.41.0-osx-x86_64.exe {code} But perhaps that could be fixed as suggested here: https://github.com/grpc/grpc-java/issues/7690 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15548) [C++][Parquet] Field-level metadata are not supported? (ColumnMetadata.key_value_metadata)
Joris Van den Bossche created ARROW-15548: - Summary: [C++][Parquet] Field-level metadata are not supported? (ColumnMetadata.key_value_metadata) Key: ARROW-15548 URL: https://issues.apache.org/jira/browse/ARROW-15548 Project: Apache Arrow Issue Type: Improvement Components: C++, Parquet Reporter: Joris Van den Bossche Due to an application where we are considering to use field-level metadata (so not schema-level metadata), but also want to be able to save this data to Parquet, I was looking into "field-level metadata" for Parquet, which I assumed we supported this. We can roundtrip Arrow's field-level metadata to/from Parquet, as shown with this example: {code:python} schema = pa.schema([pa.field("column_name", pa.int64(), metadata={"key": "value"})]) table = pa.table({'column_name': [0, 1, 2]}, schema=schema) pq.write_table(table, "test_field_metadata.parquet") >>> pq.read_table("test_field_metadata.parquet").schema column_name: int64 -- field metadata -- key: 'value' {code} However, the reason this is restored is actually because of this being stored in the Arrow schema that we (by default) store in the {{ARROW:schema}} metadata in the Parquet FileMetaData.key_value_metadata. With a small patched version to be able to turn this off (currently this is harcoded to be turned on in the python bindings), it is clear this field-level metadata is not restored on roundtrip without this stored arrow schema: {code:python} pq.write_table(table, "test_field_metadata_without_schema.parquet", store_arrow_schema=False) >>> pq.read_table("test_field_metadata_without_schema.parquet").schema column_name: int64 {code} So there is currently no mapping from Arrow's field level metadata to Parquet's column-level metadata ({{ColumnMetaData.key_value_metadata}} in Parquet's thrift structures). (which also means that using field-level metadata roundtripping to parquet only works as long as you are using Arrow for writing/reading, but not if you want to be able to also exchange such data with non-Arrow Parquet implementations) In addition, it also seems we don't even expose this field in our C++ or Python bindings, to just access that data if you would have a Parquet file (written by another implementation) that has key_value_metadata in the ColumnMetaData. cc [~emkornfield] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15547) Regression
Charley Guillaume created ARROW-15547: - Summary: Regression Key: ARROW-15547 URL: https://issues.apache.org/jira/browse/ARROW-15547 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 6.0.1 Reporter: Charley Guillaume While trying to ingest data using pyarrow 6.0.1 using this function :{{{}{}}} {code:java} def create_dataframe(list_dict: dict) -> pa.table: fields = set() for d in list_dict: fields = fields.union(d.keys()) dataframe = pa.table({f: [row.get(f) for row in list_dict] for f in fields}) return dataframe {code} {{I had the following error: }} {code:java} pyarrow.lib.ArrowInvalid: Decimal type with precision 7 does not fit into precision inferred from first array element: 8 {code} {{}} {{After downgrading too v4.0.1 the error was gone.}} {{}} {{The data looked like that : }} {noformat} [{"accounted_at": "2022-01-31T22:55:25.702000+00:00", "booked_at": "2022-01-27T09:24:17.539000+00:00", "booked_by": "7b3ce009-728d-4fbc-9120-00fa8c1c8655", "created_at": "2022-01-27T09:08:22.306000+00:00", "created_by": "7b3ce009-728d-4fbc-9120-00fa8c1c8655", "deleted_at": null, "description": "description of the record", "due_date": "2022-02-10T00:00:00+00:00", "franchise_id": "9a2858c4-5c71-43d3-b28f-2352de47ff9f", "id": "ba3f6d3a-12f4-4d78-acc5-2e59ca384c1e", "internal_code": "A.2022 / 9", "invoice_recipient_id": "7169cef9-9cb2-461f-a38f-a4d1ce3ca1c3", "lines": [{"type": "property", "amount": 7800, "soldPrice": 26, "commission": 3, "description": "Honoraires de l'agence", "commissionUnit": "PERCENT"}], "parent_id": null, "payment_term": "14-days", "recipient_emails": null, "sent_at": null, "sent_by": null, "status": "booked", "teamleader_id": "xxx-yyy-www-zzz", "type": "out"}, {"accounted_at": null, "booked_at": "2022-01-05T09:23:03.274000+00:00", "booked_by": "8a91a22d-ddb9-491a-bc2d-c06ff3f256b4", "created_at": "2022-01-05T09:21:32.503000+00:00", "created_by": "8a91a22d-ddb9-491a-bc2d-c06ff3f256b4", "deleted_at": null, "description": "Description content", "due_date": "2022-02-04T00:00:00+00:00", "franchise_id": "929d47a3-c30f-404b-aaff-c96cff1bdd10", "id": "828cd056-6aa7-4cea-9c94-ffa2db4498df", "internal_code": "BXC22 / 3", "invoice_recipient_id": "5f90aa24-4c32-401d-927c-db9d4a9f90bf", "lines": [{"type": "property", "amount": 92.55, "soldPrice": 3702.02, "commission": 2.5, "description": "description2", "commissionUnit": "PERCENT"}], "parent_id": null, "payment_term": "30-days", "recipient_emails": null, "sent_at": "2022-01-05T09:27:34.077000+00:00", "sent_by": "8a91a22d-ddb9-491a-bc2d-c06ff3f256b4", "status": "credited", "teamleader_id": "xxx-yzyzy-zzz-www", "type": "out"}]{noformat} {{}} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15546) [FlightRPC][C++] Client cookie generation should not wrap values in quotes
James Duong created ARROW-15546: --- Summary: [FlightRPC][C++] Client cookie generation should not wrap values in quotes Key: ARROW-15546 URL: https://issues.apache.org/jira/browse/ARROW-15546 Project: Apache Arrow Issue Type: Bug Components: C++, FlightRPC Reporter: James Duong Assignee: James Duong The ClientCookieMiddleware implementation in C++ incorrectly wraps each value in the generated Cookie header in quotes. Values are permitted to be wrapped in quotes in the Set-Cookie header a server would emit, but not the Cookie header the client would send back. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15545) [C++] Cast dictionary of extension type to extension type
Joris Van den Bossche created ARROW-15545: - Summary: [C++] Cast dictionary of extension type to extension type Key: ARROW-15545 URL: https://issues.apache.org/jira/browse/ARROW-15545 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Joris Van den Bossche We support casting a DictionaryArray to its dictionary values' type. For example: {code} >>> arr = pa.array([1, 2, 1]).dictionary_encode() >>> arr -- dictionary: [ 1, 2 ] -- indices: [ 0, 1, 0 ] >>> arr.type DictionaryType(dictionary) >>> arr.cast(arr.type.value_type) [ 1, 2, 1 ] {code} However, if the type of the dictionary values is an ExtensionType, this cast is not supported: {code} >>> from pyarrow.tests.test_extension_type import UuidType >>> storage = pa.array([b"0123456789abcdef"], type=pa.binary(16)) >>> arr = pa.ExtensionArray.from_storage(UuidType(), storage) >>> arr [ 30313233343536373839616263646566 ] >>> dict_arr = pa.DictionaryArray.from_arrays(pa.array([0, 0], pa.int32()), arr) >>> dict_arr.type DictionaryType(dictionary>, indices=int32, ordered=0>) >>> dict_arr.cast(UuidType()) ... ArrowNotImplementedError: Unsupported cast from dictionary>, indices=int32, ordered=0> to extension> (no available cast function for target type) ../src/arrow/compute/cast.cc:119 GetCastFunctionInternal(cast_options->to_type, args[0].type().get()) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15544) [Go][Parquet] pqarrow.getOriginSchema error while decoding ARROW:schema
Antoine Gelloz created ARROW-15544: -- Summary: [Go][Parquet] pqarrow.getOriginSchema error while decoding ARROW:schema Key: ARROW-15544 URL: https://issues.apache.org/jira/browse/ARROW-15544 Project: Apache Arrow Issue Type: Bug Components: Go, Parquet Affects Versions: 7.0.0 Environment: go1.17, python3.8 Reporter: Antoine Gelloz Hello ! This is my first time participating in the open source community as a junior developer and I would like to thank you all for your hard work :) While using the new pqarrow package for our project[ [Metronlab/bow||https://github.com/Metronlab/bow],] [https://github.com/Metronlab/bow] |https://github.com/Metronlab/bow],] [to read parquet files previously written by Pandas. An error is returned by function getOriginSchema if the "ARROW:schema" base64 encoded value is ending with padding characters.|https://github.com/Metronlab/bow],] This is caused by the use of the [RawStdEncoding|https://pkg.go.dev/encoding/base64#pkg-variables] type that omits padding characters. Is there any reason for using raw encoding instead of standard? Here is a repo with a test script to demonstrate the problem: [antoinegelloz/arrowparquet|https://github.com/antoinegelloz/arrowparquet] Thank you in advance for your help, Antoine Gelloz -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15543) [Doc] Improve documentation on usage of Schema metadata and usage/limitations in Execplan
Vibhatha Lakmal Abeykoon created ARROW-15543: Summary: [Doc] Improve documentation on usage of Schema metadata and usage/limitations in Execplan Key: ARROW-15543 URL: https://issues.apache.org/jira/browse/ARROW-15543 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Vibhatha Lakmal Abeykoon Assignee: Vibhatha Lakmal Abeykoon Schema metadata can be an important aspect related to computation and maintaining states. How this is affected in the execution plan and how it needs to be handled can be further documented. -- This message was sent by Atlassian Jira (v8.20.1#820001)