val test_schema = DataType.fromJson(schema).asInstanceOf[StructType] val session = SparkHelper.getSparkSession val df1: DataFrame = session.read .format("json") .schema(test_schema) .option("inferSchema","false") .option("mode","FAILFAST") .load("src/test/resources/*.gz") df1.show(80)
On Wed, Mar 28, 2018 at 5:10 PM, Colin Williams <colin.williams.seat...@gmail.com> wrote: > I've had more success exporting the schema toJson and importing that. > Something like: > > > val df1: DataFrame = session.read > .format("json") > .schema(test_schema) > .option("inferSchema","false") > .option("mode","FAILFAST") > .load("src/test/resources/*.gz") > df1.show(80) > > > > On Wed, Mar 28, 2018 at 3:25 PM, Colin Williams > <colin.williams.seat...@gmail.com> wrote: >> The to String representation look like where "someName" is unique: >> >> StructType(StructField("someName",StringType,true), >> StructField("someName",StructType(StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType, true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName", >> StructType(StructField("someName",StringType,true), >> StructField("someName",StringType,true)),true), >> StructField("someName",StructType(StructField("someName",StringType,true), >> StructField("someName",StringType, true)),true)),true), >> StructField("someName",BooleanType,true), >> StructField("someName",LongType,true), >> StructField("someName",StringType,true), >> StructField("someName",StringType,true), >> StructField("someName",StringType,true), >> StructField("someName",StringType,true)) >> >> >> The catalogString looks something like where SOME_TABLE_NAME is unique: >> >> struct<action:string,SOME_TABLE_NAME:struct<SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>, >> >> SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string>, >> SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string, >> >> SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME: >> struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME: >> >> string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue: >> string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>, >> >> SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string, >> >> SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME: >> struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME: >> >> string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue: >> string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>, >> >> SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string, >> >> SOME_TABLE_NAME:string>,SOME_TABLE_NAME:struct<newValue:string,SOME_TABLE_NAME:string>,SOME_TABLE_NAME: >> struct<newValue:string,SOME_TABLE_NAME:string>>,SOME_TABLE_NAME:boolean,SOME_TABLE_NAME:bigint, >> >> SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string,SOME_TABLE_NAME:string> >> >> >> On Wed, Mar 28, 2018 at 2:32 PM, Colin Williams >> <colin.williams.seat...@gmail.com> wrote: >>> I've been learning spark-sql and have been trying to export and import >>> some of the generated schemas to edit them. I've been writing the >>> schemas to strings like df1.schema.toString() and >>> df.schema.catalogString >>> >>> But I've been having trouble loading the schemas created. Does anyone >>> know if it's possible to work with the catalogString? I couldn't find >>> too many resources working with it. Is it possible to create a schema >>> from this string and load from it using the SparkSession? >>> >>> Similarly I haven't yet sucessfully loaded the toString Schema, after >>> some small edits... >>> >>> >>> There's a little tidbit about some of this here: >>> https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-DataType.html --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org