[Python] "OverflowError: int too big to convert" with target_type float64 - allow loss of precision?

Chris Comeau Mon, 08 May 2023 11:07:58 -0700

Is there any way to have pa.compute.cast handle int -> float64 with
accepted loss of precision?


Source value is a python int that's too long for int64, like
12345678901234567890, and I'd like to put into a float64 field in Arrow
table.

Using pyarrow 12.0.0:

pa.array([12345678901234567890], type=pa.float64())
-> ArrowInvalid: PyLong is too large to fit int64

Converting it myself works with expected loss of precision
pa.array([float(12345678901234567890)], type=pa.float64())

-> [1.2345678901234567e+19]


but I can't get pa.compute to do the same. Some examples:


pa.compute.cast([20033613169503999008], target_type=pa.float64(),
safe=False)
-> OverflowError: int too big to convert

pa.compute.cast(
    [12345678901234567890],
    options = pa.compute.CastOptions.unsafe(target_type=pa.float64())
)
-> OverflowError: int too big to convert

I tried the other options like int overflow and float truncate with no luck.

Asking Arrow to infer types hits the same error:
pa_array = pa.array([12345678901234567890])
-> OverflowError: int too big to convert

Cast to decimal128(38,0) works if it's set explicitly
pa.array([12345678901234567890], type=pa.decimal128(38, 0))

<pyarrow.lib.Decimal128Array object at 0x000001C45FAA8B80>
-> [12345678901234567890]

I'm working around it by doing the float() conversion myself, but this
is slower of course.

[Python] "OverflowError: int too big to convert" with target_type float64 - allow loss of precision?

Reply via email to