Is there a way to force the physical storage type to INT32/INT64 rather than a byte array? I was hoping to use DELTA_BINARY_PACKED for the column encoding.
On Thu, Apr 25, 2024, 03:35 Joris Van den Bossche < [email protected]> wrote: > Hi Brian, > > The pyarrow types are automatically mapped to Parquet types (see > https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types for some > details on this mapping), and with pyarrow you can define such a decimal > type: > > >>> arr = pa.array([1.2, 3.4]).cast(pa.decimal128(6, 4)) > >>> table = pa.table({"col": arr}) > >>> table > pyarrow.Table > col: decimal128(6, 4) > ---- > col: [[1.2000,3.4000]] > > and then writing that table to parquet will give a decimal parquet logical > type: > > >>> import pyarrow.parquet as pq > >>> pq.write_table(table, "test_decimal.parquet") > >>> pq.read_metadata("test_decimal.parquet").schema > <pyarrow._parquet.ParquetSchema object at 0x7fca43fcc900> > required group field_id=-1 schema { > optional fixed_len_byte_array(3) field_id=-1 col (Decimal(precision=6, > scale=4)); > } > > Best, > Joris > > > On Thu, 25 Apr 2024 at 05:15, Brian Kiefer <[email protected]> wrote: > >> Hello, >> >> Does anyone have an example of how to write a logical type (e.g. >> Decimal(6,4)) to Parquet using pyarrow? The built in pyarrow types for >> defining a Schema do not include most logical types and I can't find >> anything in the pyarrow parquet API to further annotate the columns. Is >> there an approach for this? >> >> Thank you, >> Brian >> >>
