Re: How to specify default value for StructField?

Yong Zhang Tue, 14 Feb 2017 17:32:54 -0800

You maybe are looking for something like "spark.sql.parquet.mergeSchema" for 
ORC. Unfortunately, I don't think it is available, unless someone tells me I am 
wrong.


You can create a JIRA to request this feature, but we all know that Parquet is 
the first citizen format [😊]

Yong

________________________________
From: Begar, Veena <veena.be...@hpe.com>
Sent: Tuesday, February 14, 2017 10:37 AM
To: smartzjp; user@spark.apache.org
Subject: RE: How to specify default value for StructField?

Thanks, it didn't work. Because, the folder has files from 2 different schemas.
It fails with the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`f2`' given input 
columns: [f1];


-----Original Message-----
From: smartzjp [mailto:zjp_j...@163.com]
Sent: Tuesday, February 14, 2017 10:32 AM
To: Begar, Veena <veena.be...@hpe.com>; user@spark.apache.org
Subject: Re: How to specify default value for StructField?

You can try the below code.

val df = spark.read.format("orc").load("/user/hos/orc_files_test_together")
df.select(“f1”,”f2”).show





在 2017/2/14 上午6:54，“vbegar”<user-return-67879-zjp_jdev=163....@spark.apache.org 
代表 veena.be...@hpe.com> 写入:

>Hello,
>
>I specified a StructType like this:
>
>*val mySchema = StructType(Array(StructField("f1", StringType,
>true),StructField("f2", StringType, true)))*
>
>I have many ORC files stored in HDFS location:*
>/user/hos/orc_files_test_together
>*
>
>These files use different schema : some of them have only f1 columns
>and other have both f1 and f2 columns.
>
>I read the data from these files to a dataframe:
>*val df =
>spark.read.format("orc").schema(mySchema).load("/user/hos/orc_files_tes
>t_together")*
>
>But, now when I give the following command to see the data, it fails:
>*df.show*
>
>The error message is like "f2" comun doesn't exist.
>
>Since I have specified nullable attribute as true for f2 column, why it
>fails?
>
>Or, is there any way to specify default vaule for StructField?
>
>Because, in AVRO schema, we can specify the default value in this way
>and can read AVRO files in a folder which have 2 different schemas
>(either only
>f1 column or both f1 and f2 columns):
>
>*{
>   "type": "record",
>   "name": "myrecord",
>   "fields":
>   [
>      {
>         "name": "f1",
>         "type": "string",
>         "default": ""
>      },
>      {
>         "name": "f2",
>         "type": "string",
>         "default": ""
>      }
>   ]
>}*
>
>Wondering why it doesn't work with ORC files.
>
>thanks.
>
>
>
>--
>View this message in context:
>http://apache-spark-user-list.1001560.n3.nabble.com/How-to-specify-defa
>ult-value-for-StructField-tp28386.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

Re: How to specify default value for StructField?

Reply via email to