Using the python APIs, there is no direct way to specify the physical storage except for changing the Arrow type. If you want to store the values as int32 or int64, you first need to cast the Arrow data to such type.
Joris On Thu, 25 Apr 2024 at 13:42, Brian Kiefer <[email protected]> wrote: > Is there a way to force the physical storage type to INT32/INT64 rather > than a byte array? I was hoping to use DELTA_BINARY_PACKED for the column > encoding. > > On Thu, Apr 25, 2024, 03:35 Joris Van den Bossche < > [email protected]> wrote: > >> Hi Brian, >> >> The pyarrow types are automatically mapped to Parquet types (see >> https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types for >> some details on this mapping), and with pyarrow you can define such a >> decimal type: >> >> >>> arr = pa.array([1.2, 3.4]).cast(pa.decimal128(6, 4)) >> >>> table = pa.table({"col": arr}) >> >>> table >> pyarrow.Table >> col: decimal128(6, 4) >> ---- >> col: [[1.2000,3.4000]] >> >> and then writing that table to parquet will give a decimal parquet >> logical type: >> >> >>> import pyarrow.parquet as pq >> >>> pq.write_table(table, "test_decimal.parquet") >> >>> pq.read_metadata("test_decimal.parquet").schema >> <pyarrow._parquet.ParquetSchema object at 0x7fca43fcc900> >> required group field_id=-1 schema { >> optional fixed_len_byte_array(3) field_id=-1 col (Decimal(precision=6, >> scale=4)); >> } >> >> Best, >> Joris >> >> >> On Thu, 25 Apr 2024 at 05:15, Brian Kiefer <[email protected]> wrote: >> >>> Hello, >>> >>> Does anyone have an example of how to write a logical type (e.g. >>> Decimal(6,4)) to Parquet using pyarrow? The built in pyarrow types for >>> defining a Schema do not include most logical types and I can't find >>> anything in the pyarrow parquet API to further annotate the columns. Is >>> there an approach for this? >>> >>> Thank you, >>> Brian >>> >>>
