> > Each json file is of a single object and has the potential to have > variance in the schema. > How much variance are we talking? JSON->Parquet is going to do well with 100s of different columns, but at 10,000s many things will probably start breaking.
- Spark SQL / Parquet - Dynamic Schema detection Anthony Andras
- Re: Spark SQL / Parquet - Dynamic Schema detection Michael Armbrust
