Merging of parquet file schemas

Dan Amner Tue, 10 Jul 2018 07:55:01 -0700

Hi,

I am attempting to read a number of smaller parquet files and merge them into a 
larger parquet file.


The files are created by Spark jobs that run periodically throughout the day.

The issue I have is that the small parquet files can have slightly different 
schemas and when I create the Dataset it complains that the schemas aren’t the 
same. 

Spark handles this by merging the schemas together, is there functionality in 
pyarrow that can do the same?

Thanks,
Dan

Merging of parquet file schemas

Reply via email to