If you’re confident the schema of all files is consistent, then just infer the 
schema from a single file and reuse it when loading the whole data set:

val schema = spark.read.json(“/path/to/single/file.json”).schema

val wholeDataSet = spark.read.schema(schema).json(“/path/to/whole/datasets”)


Thanks,
Silvio

On 10/16/17, 10:20 AM, "Jeroen Miller" <bluedasya...@gmail.com> wrote:

    Hello Spark users,
    
    Does anyone know if there is a way to generate the Scala code for a complex 
structure just from the output of dataframe.printSchema?
    
    I have to analyse a significant volume of data and want to explicitly set 
the schema(s) to avoid having to read my (compressed) JSON files multiple 
times. What I am doing so far is to read a few files, print the schema, and 
manually write the code to define the corresponding StructType: tedious and 
error-prone.
    
    I'm sure there is a much better way, but can't find anything about it.
    
    Pointers anyone?
    
    Jeroen
    
    
    ---------------------------------------------------------------------
    To unsubscribe e-mail: user-unsubscr...@spark.apache.org
    
    

Reply via email to