Hi Brian,
The pyarrow types are automatically mapped to Parquet types (see
https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types for some
details on this mapping), and with pyarrow you can define such a decimal
type:
>>> arr = pa.array([1.2, 3.4]).cast(pa.decimal128(6, 4))
>>> table = pa.table({"col": arr})
>>> table
pyarrow.Table
col: decimal128(6, 4)
----
col: [[1.2000,3.4000]]
and then writing that table to parquet will give a decimal parquet logical
type:
>>> import pyarrow.parquet as pq
>>> pq.write_table(table, "test_decimal.parquet")
>>> pq.read_metadata("test_decimal.parquet").schema
<pyarrow._parquet.ParquetSchema object at 0x7fca43fcc900>
required group field_id=-1 schema {
optional fixed_len_byte_array(3) field_id=-1 col (Decimal(precision=6,
scale=4));
}
Best,
Joris
On Thu, 25 Apr 2024 at 05:15, Brian Kiefer <[email protected]> wrote:
> Hello,
>
> Does anyone have an example of how to write a logical type (e.g.
> Decimal(6,4)) to Parquet using pyarrow? The built in pyarrow types for
> defining a Schema do not include most logical types and I can't find
> anything in the pyarrow parquet API to further annotate the columns. Is
> there an approach for this?
>
> Thank you,
> Brian
>
>