Thank you, Micah! That makes sense. Do you have any thoughts about maybe adding a logged warning if a user calls write_table() and uint32() is in the given schema?
On Fri, Jan 28, 2022 at 11:15 AM Micah Kornfield <[email protected]> wrote: > Hi Grant, > This is intended behavior because the default writing of parquet uses > version 1 of logical types. Version 1 does not support annotating fields as > uint32, so to preserve the values round trip they are cast to int64. If > you wish to maintain the type setting the version kwarg to 2.4 or 2.6 [1] > should work. > > Cheers, > Micah > > > [1] > https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html > > On Fri, Jan 28, 2022 at 9:04 AM Grant Williams <[email protected]> > wrote: > >> Hello, >> >> I've found that if you write a file that has a schema that specifies >> column A as a uint32() type. If you read the file and inspect the schema it >> will show Column A as int64(). This issue appears to be unique to the >> uint32() type and I was unable to get any other type mismatches with the >> other integer or float types. >> >> The following is a link to a gist showing a minimal code example and the >> output from it: >> https://gist.github.com/grantmwilliams/1ceb490312c59e4fb6e4bc15b57e9707. >> >> I'm not sure if this is a problem with the physical datatype being >> actually written as int64, or if the metadata for the file is just wrong >> instead. Does anyone have any idea what could be causing this? Or whether >> it's just a metadata issue or an actual physical type error? >> >> Thanks, >> Grant W. >> -- >> Grant Williams >> Machine Learning Engineer >> https://github.com/grantmwilliams/ >> > -- Grant Williams Machine Learning Engineer https://github.com/grantmwilliams/
