Re: Generating StructType from dataframe.printSchema
On 16 Oct 2017, at 16:22, Silvio Fiorito wrote: > [...] then just infer the schema from a single file and reuse it when loading > the whole data set: Well, that is a possibility indeed. Thanks, Jeroen - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Generating StructType from dataframe.printSchema
If you’re confident the schema of all files is consistent, then just infer the schema from a single file and reuse it when loading the whole data set: val schema = spark.read.json(“/path/to/single/file.json”).schema val wholeDataSet = spark.read.schema(schema).json(“/path/to/whole/datasets”) Thanks, Silvio On 10/16/17, 10:20 AM, "Jeroen Miller" wrote: Hello Spark users, Does anyone know if there is a way to generate the Scala code for a complex structure just from the output of dataframe.printSchema? I have to analyse a significant volume of data and want to explicitly set the schema(s) to avoid having to read my (compressed) JSON files multiple times. What I am doing so far is to read a few files, print the schema, and manually write the code to define the corresponding StructType: tedious and error-prone. I'm sure there is a much better way, but can't find anything about it. Pointers anyone? Jeroen - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Generating StructType from dataframe.printSchema
Hello Spark users, Does anyone know if there is a way to generate the Scala code for a complex structure just from the output of dataframe.printSchema? I have to analyse a significant volume of data and want to explicitly set the schema(s) to avoid having to read my (compressed) JSON files multiple times. What I am doing so far is to read a few files, print the schema, and manually write the code to define the corresponding StructType: tedious and error-prone. I'm sure there is a much better way, but can't find anything about it. Pointers anyone? Jeroen - To unsubscribe e-mail: user-unsubscr...@spark.apache.org