Is there a way to force the physical storage type to INT32/INT64 rather
than a byte array? I was hoping to use DELTA_BINARY_PACKED for the column
encoding.

On Thu, Apr 25, 2024, 03:35 Joris Van den Bossche <
[email protected]> wrote:

> Hi Brian,
>
> The pyarrow types are automatically mapped to Parquet types (see
> https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types for some
> details on this mapping), and with pyarrow you can define such a decimal
> type:
>
> >>> arr = pa.array([1.2, 3.4]).cast(pa.decimal128(6, 4))
> >>> table = pa.table({"col": arr})
> >>> table
> pyarrow.Table
> col: decimal128(6, 4)
> ----
> col: [[1.2000,3.4000]]
>
> and then writing that table to parquet will give a decimal parquet logical
> type:
>
> >>> import pyarrow.parquet as pq
> >>> pq.write_table(table, "test_decimal.parquet")
> >>> pq.read_metadata("test_decimal.parquet").schema
> <pyarrow._parquet.ParquetSchema object at 0x7fca43fcc900>
> required group field_id=-1 schema {
>   optional fixed_len_byte_array(3) field_id=-1 col (Decimal(precision=6,
> scale=4));
> }
>
> Best,
> Joris
>
>
> On Thu, 25 Apr 2024 at 05:15, Brian Kiefer <[email protected]> wrote:
>
>> Hello,
>>
>> Does anyone have an example of how to write a logical type (e.g.
>> Decimal(6,4)) to Parquet using pyarrow? The built in pyarrow types for
>> defining a Schema do not include most logical types and I can't find
>> anything in the pyarrow parquet API to further annotate the columns. Is
>> there an approach for this?
>>
>> Thank you,
>> Brian
>>
>>

Reply via email to