Sure. I ran the same job with fewer columns, the exception: java.lang.IllegalArgumentException: requirement failed: DataFrame must have the same schema as the relation to which is inserted. DataFrame schema: StructType(StructField(pixel0,ByteType,true), StructField(pixel1,ByteType,true), StructField(pixel10,ByteType,true), StructField(pixel100,ShortType,true), StructField(pixel101,ShortType,true), StructField(pixel102,ShortType,true), StructField(pixel103,ShortType,true), StructField(pixel105,ShortType,true), StructField(pixel106,ShortType,true), StructField(id,DoubleType,true), StructField(label,ByteType,true), StructField(predict,DoubleType,true)) Relation schema: StructType(StructField(pixel0,ByteType,true), StructField(pixel1,ByteType,true), StructField(pixel10,ByteType,true), StructField(pixel100,ShortType,true), StructField(pixel101,ShortType,true), StructField(pixel102,ShortType,true), StructField(pixel103,ShortType,true), StructField(pixel105,ShortType,true), StructField(pixel106,ShortType,true), StructField(id,DoubleType,true), StructField(label,ByteType,true), StructField(predict,DoubleType,true))
at scala.Predef$.require(Predef.scala:233) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:113) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57) at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137) at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304) Regards, Zsolt 2016-02-12 13:11 GMT+01:00 Ted Yu <yuzhih...@gmail.com>: > Can you pastebin the full error with all column types ? > > There should be a difference between some column(s). > > Cheers > > > On Feb 11, 2016, at 2:12 AM, Zsolt Tóth <toth.zsolt....@gmail.com> > wrote: > > > > Hi, > > > > I'd like to append a column of a dataframe to another DF (using Spark > 1.5.2): > > > > DataFrame outputDF = unlabelledDF.withColumn("predicted_label", > predictedDF.col("predicted")); > > > > I get the following exception: > > > > java.lang.IllegalArgumentException: requirement failed: DataFrame must > have the same schema as the relation to which is inserted. > > DataFrame schema: > StructType(StructField(predicted_label,DoubleType,true), ...<other 700 > numerical (ByteType/ShortType) columns> > > Relation schema: > StructType(StructField(predicted_label,DoubleType,true), ...<the same 700 > columns> > > > > The interesting part is that the two schemas in the exception are > exactly the same. > > The same code with other input data (with fewer, both numerical and > non-numerical column) succeeds. > > Any idea why this happens? > > >