OK, partially answering myself - pq.ParquetDataset was used and filter
expression in the new format does not work here.

import pyarrow as pa
import pyarrow.parquet as pq
import numpy as np

t = pa.table({"d": pa.array([1, 2, 1000]).cast(pa.decimal128(38,10))})
f = pa.BufferOutputStream()
pq.write_table(t, f)
b=f.getvalue()

ds = pq.ParquetDataset(b, filters=[['d', '=', 1]])
ds.read()  # fails

funnily:
['d', '=', np.int16(1000)] works while
['d', '=', np.int16(1)] fails

Best Regards,

Jacek




czw., 16 lis 2023 o 13:42 Jacek Pliszka <[email protected]> napisaƂ(a):
>
> Hi!
>
> I found that after upgrading from 11.0.0. to 12.0.0 (or anything above)
> read_table filters with int64 on decimal columns stopped working.
>
> a1.pq parquet file containing:
> pyarrow.Table
> dec: decimal128(38, 10)
> ----
> dec: [[8024.0000000000,8010.0000000000]]
>
> pq.read_table("a1.pq", filters=[['dec', '==', 8024]])
>
> raises
>
> pyarrow.lib.ArrowInvalid: Precision is not great enough for the
> result. It should be at least 29
>
> It worked fine on 11.0.0. What works now is:
>
> pq.read_table("a1.pq", filters=[['dec', '==', np.int16(8024)]])
> or
> pq.read_table("a1.pq", filters=[['dec', '==', pc.cast(8024,
> pa.decimal128(38,10))]])
>
> Does someone know what happened?
>
> It looks kind of strange that it works for np.int16 and decimal but not int64.
> And 29 seems confusing as 2**64<10**20 and 38-10=28 > 20
>
> Thanks for any help,
>
> Jacek Pliszka

Reply via email to