Re: Inserting column to DataFrame

Zsolt Tóth Fri, 12 Feb 2016 04:26:02 -0800

Sure. I ran the same job with fewer columns, the exception:

java.lang.IllegalArgumentException: requirement failed: DataFrame must
have the same schema as the relation to which is inserted.
DataFrame schema: StructType(StructField(pixel0,ByteType,true),
StructField(pixel1,ByteType,true), StructField(pixel10,ByteType,true),
StructField(pixel100,ShortType,true),
StructField(pixel101,ShortType,true),
StructField(pixel102,ShortType,true),
StructField(pixel103,ShortType,true),
StructField(pixel105,ShortType,true),
StructField(pixel106,ShortType,true), StructField(id,DoubleType,true),
StructField(label,ByteType,true),
StructField(predict,DoubleType,true))
Relation schema: StructType(StructField(pixel0,ByteType,true),
StructField(pixel1,ByteType,true), StructField(pixel10,ByteType,true),
StructField(pixel100,ShortType,true),
StructField(pixel101,ShortType,true),
StructField(pixel102,ShortType,true),
StructField(pixel103,ShortType,true),
StructField(pixel105,ShortType,true),
StructField(pixel106,ShortType,true), StructField(id,DoubleType,true),
StructField(label,ByteType,true),
StructField(predict,DoubleType,true))


        at scala.Predef$.require(Predef.scala:233)
        at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:113)
        at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
        at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
        at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
        at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
        at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
        at 
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
        at 
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
        at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
        at 
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
        at 
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
        at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)

Regards,

Zsolt


2016-02-12 13:11 GMT+01:00 Ted Yu <yuzhih...@gmail.com>:

> Can you pastebin the full error with all column types ?
>
> There should be a difference between some column(s).
>
> Cheers
>
> > On Feb 11, 2016, at 2:12 AM, Zsolt Tóth <toth.zsolt....@gmail.com>
> wrote:
> >
> > Hi,
> >
> > I'd like to append a column of a dataframe to another DF (using Spark
> 1.5.2):
> >
> > DataFrame outputDF = unlabelledDF.withColumn("predicted_label",
> predictedDF.col("predicted"));
> >
> > I get the following exception:
> >
> > java.lang.IllegalArgumentException: requirement failed: DataFrame must
> have the same schema as the relation to which is inserted.
> > DataFrame schema:
> StructType(StructField(predicted_label,DoubleType,true), ...<other 700
> numerical (ByteType/ShortType) columns>
> > Relation schema:
> StructType(StructField(predicted_label,DoubleType,true), ...<the same 700
> columns>
> >
> > The interesting part is that the two schemas in the exception are
> exactly the same.
> > The same code with other input data (with fewer, both numerical and
> non-numerical column) succeeds.
> > Any idea why this happens?
> >
>

Re: Inserting column to DataFrame

Reply via email to