>
> Each json file is of a single object and has the potential to have
> variance in the schema.
>
How much variance are we talking? JSON->Parquet is going to do well with
100s of different columns, but at 10,000s many things will probably start
breaking.
Hello there,
I am trying to write a program in Spark that is attempting to load multiple
json files (with undefined schemas) into a dataframe and then write it out
to a parquet file. When doing so, I am running into a number of garbage
collection issues as a result of my JVM running out of heap