Python 3.7.6 (default, Jan 30 2020, 10:29:04)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
print(pyarrow.__version__)
0.17.1
Seeing an issue where DECIMAL values written can seem to be corrupted based on
very subtle changes to the data set. Example:
#!/bin/python3
import pandas as pd
import
Ah - I hadn't thought about how the object dtype complicates things:
What I'm trying to do at a higher level is maybe wacky:
- I want a set of parquet files to be read/written by PySpark and Pandas
interchangeably.
- For each file, I want to to specify, in code, the column types