The ability to write Decimal as integer values is configurable in C++ via `enable_store_decimal_as_integer` [1] but I couldn't find it plumbed through to python.
[1] https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterProperties7Builder31enable_store_decimal_as_integerEv On Sun, Apr 28, 2024 at 11:52 PM Joris Van den Bossche < [email protected]> wrote: > Using the python APIs, there is no direct way to specify the physical > storage except for changing the Arrow type. If you want to store the values > as int32 or int64, you first need to cast the Arrow data to such type. > > Joris > > On Thu, 25 Apr 2024 at 13:42, Brian Kiefer <[email protected]> wrote: > >> Is there a way to force the physical storage type to INT32/INT64 rather >> than a byte array? I was hoping to use DELTA_BINARY_PACKED for the column >> encoding. >> >> On Thu, Apr 25, 2024, 03:35 Joris Van den Bossche < >> [email protected]> wrote: >> >>> Hi Brian, >>> >>> The pyarrow types are automatically mapped to Parquet types (see >>> https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types for >>> some details on this mapping), and with pyarrow you can define such a >>> decimal type: >>> >>> >>> arr = pa.array([1.2, 3.4]).cast(pa.decimal128(6, 4)) >>> >>> table = pa.table({"col": arr}) >>> >>> table >>> pyarrow.Table >>> col: decimal128(6, 4) >>> ---- >>> col: [[1.2000,3.4000]] >>> >>> and then writing that table to parquet will give a decimal parquet >>> logical type: >>> >>> >>> import pyarrow.parquet as pq >>> >>> pq.write_table(table, "test_decimal.parquet") >>> >>> pq.read_metadata("test_decimal.parquet").schema >>> <pyarrow._parquet.ParquetSchema object at 0x7fca43fcc900> >>> required group field_id=-1 schema { >>> optional fixed_len_byte_array(3) field_id=-1 col (Decimal(precision=6, >>> scale=4)); >>> } >>> >>> Best, >>> Joris >>> >>> >>> On Thu, 25 Apr 2024 at 05:15, Brian Kiefer <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> Does anyone have an example of how to write a logical type (e.g. >>>> Decimal(6,4)) to Parquet using pyarrow? The built in pyarrow types for >>>> defining a Schema do not include most logical types and I can't find >>>> anything in the pyarrow parquet API to further annotate the columns. Is >>>> there an approach for this? >>>> >>>> Thank you, >>>> Brian >>>> >>>>
