You maybe are looking for something like "spark.sql.parquet.mergeSchema" for ORC. Unfortunately, I don't think it is available, unless someone tells me I am wrong.
You can create a JIRA to request this feature, but we all know that Parquet is the first citizen format [😊] Yong ________________________________ From: Begar, Veena <veena.be...@hpe.com> Sent: Tuesday, February 14, 2017 10:37 AM To: smartzjp; user@spark.apache.org Subject: RE: How to specify default value for StructField? Thanks, it didn't work. Because, the folder has files from 2 different schemas. It fails with the following exception: org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input columns: [f1]; -----Original Message----- From: smartzjp [mailto:zjp_j...@163.com] Sent: Tuesday, February 14, 2017 10:32 AM To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org Subject: Re: How to specify default value for StructField? You can try the below code. val df = spark.read.format("orc").load("/user/hos/orc_files_test_together") df.select(“f1”,”f2”).show 在 2017/2/14 上午6:54,“vbegar”<user-return-67879-zjp_jdev=163....@spark.apache.org 代表 veena.be...@hpe.com> 写入: >Hello, > >I specified a StructType like this: > >*val mySchema = StructType(Array(StructField("f1", StringType, >true),StructField("f2", StringType, true)))* > >I have many ORC files stored in HDFS location:* >/user/hos/orc_files_test_together >* > >These files use different schema : some of them have only f1 columns >and other have both f1 and f2 columns. > >I read the data from these files to a dataframe: >*val df = >spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes >t_together")* > >But, now when I give the following command to see the data, it fails: >*df.show* > >The error message is like "f2" comun doesn't exist. > >Since I have specified nullable attribute as true for f2 column, why it >fails? > >Or, is there any way to specify default vaule for StructField? > >Because, in AVRO schema, we can specify the default value in this way >and can read AVRO files in a folder which have 2 different schemas >(either only >f1 column or both f1 and f2 columns): > >*{ > "type": "record", > "name": "myrecord", > "fields": > [ > { > "name": "f1", > "type": "string", > "default": "" > }, > { > "name": "f2", > "type": "string", > "default": "" > } > ] >}* > >Wondering why it doesn't work with ORC files. > >thanks. > > > >-- >View this message in context: >http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa >ult-value-for-StructField-tp28386.html >Sent from the Apache Spark User List mailing list archive at Nabble.com. > >--------------------------------------------------------------------- >To unsubscribe e-mail: user-unsubscr...@spark.apache.org >