The ability to write Decimal as integer values is configurable in C++ via
`enable_store_decimal_as_integer` [1] but I couldn't find it plumbed
through to python.


[1]
https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N7parquet16WriterProperties7Builder31enable_store_decimal_as_integerEv

On Sun, Apr 28, 2024 at 11:52 PM Joris Van den Bossche <
[email protected]> wrote:

> Using the python APIs, there is no direct way to specify the physical
> storage except for changing the Arrow type. If you want to store the values
> as int32 or int64, you first need to cast the Arrow data to such type.
>
> Joris
>
> On Thu, 25 Apr 2024 at 13:42, Brian Kiefer <[email protected]> wrote:
>
>> Is there a way to force the physical storage type to INT32/INT64 rather
>> than a byte array? I was hoping to use DELTA_BINARY_PACKED for the column
>> encoding.
>>
>> On Thu, Apr 25, 2024, 03:35 Joris Van den Bossche <
>> [email protected]> wrote:
>>
>>> Hi Brian,
>>>
>>> The pyarrow types are automatically mapped to Parquet types (see
>>> https://arrow.apache.org/docs/dev/cpp/parquet.html#logical-types for
>>> some details on this mapping), and with pyarrow you can define such a
>>> decimal type:
>>>
>>> >>> arr = pa.array([1.2, 3.4]).cast(pa.decimal128(6, 4))
>>> >>> table = pa.table({"col": arr})
>>> >>> table
>>> pyarrow.Table
>>> col: decimal128(6, 4)
>>> ----
>>> col: [[1.2000,3.4000]]
>>>
>>> and then writing that table to parquet will give a decimal parquet
>>> logical type:
>>>
>>> >>> import pyarrow.parquet as pq
>>> >>> pq.write_table(table, "test_decimal.parquet")
>>> >>> pq.read_metadata("test_decimal.parquet").schema
>>> <pyarrow._parquet.ParquetSchema object at 0x7fca43fcc900>
>>> required group field_id=-1 schema {
>>>   optional fixed_len_byte_array(3) field_id=-1 col (Decimal(precision=6,
>>> scale=4));
>>> }
>>>
>>> Best,
>>> Joris
>>>
>>>
>>> On Thu, 25 Apr 2024 at 05:15, Brian Kiefer <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> Does anyone have an example of how to write a logical type (e.g.
>>>> Decimal(6,4)) to Parquet using pyarrow? The built in pyarrow types for
>>>> defining a Schema do not include most logical types and I can't find
>>>> anything in the pyarrow parquet API to further annotate the columns. Is
>>>> there an approach for this?
>>>>
>>>> Thank you,
>>>> Brian
>>>>
>>>>

Reply via email to