Possible Decimal write issue with pyarrow

2020-06-01 Thread Rich Bramante
Python 3.7.6 (default, Jan 30 2020, 10:29:04) [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux print(pyarrow.__version__) 0.17.1 Seeing an issue where DECIMAL values written can seem to be corrupted based on very subtle changes to the data set. Example: #!/bin/python3 import pandas as pd import

Re: check whether pandas type is convertible to arrow type

2020-06-01 Thread Sandy Ryza
Ah - I hadn't thought about how the object dtype complicates things: What I'm trying to do at a higher level is maybe wacky: - I want a set of parquet files to be read/written by PySpark and Pandas interchangeably. - For each file, I want to to specify, in code, the column types