Re: Pyarrow uint32() int64() column type mismatch bug?

Micah Kornfield Fri, 28 Jan 2022 09:15:18 -0800

Hi Grant,
This is intended behavior because the default writing of parquet  uses
version 1 of logical types. Version 1 does not support annotating fields as
uint32, so to preserve the values round trip they are cast to int64.  If
you wish to maintain the type setting the version kwarg to 2.4 or 2.6 [1]
should work.


Cheers,
Micah


[1]
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html

On Fri, Jan 28, 2022 at 9:04 AM Grant Williams <[email protected]>
wrote:

> Hello,
>
> I've found that if you write a file that has a schema that specifies
> column A as a uint32() type. If you read the file and inspect the schema it
> will show Column A as int64(). This issue appears to be unique to the
> uint32() type and I was unable to get any other type mismatches with the
> other integer or float types.
>
> The following is a link to a gist showing a minimal code example and the
> output from it:
> https://gist.github.com/grantmwilliams/1ceb490312c59e4fb6e4bc15b57e9707.
>
> I'm not sure if this is a problem with the physical datatype being
> actually written as int64, or if the metadata for the file is just wrong
> instead. Does anyone have any idea what could be causing this? Or whether
> it's just a metadata issue or an actual physical type error?
>
> Thanks,
> Grant W.
> --
> Grant Williams
> Machine Learning Engineer
> https://github.com/grantmwilliams/
>

Re: Pyarrow uint32() int64() column type mismatch bug?

Reply via email to