Re: Inserting column to DataFrame

Ted Yu Fri, 12 Feb 2016 04:28:25 -0800

Seems like a bug.

Suggest filing an issue with code snippet if this can be reproduced on 1.6
branch.


Cheers

On Fri, Feb 12, 2016 at 4:25 AM, Zsolt Tóth <toth.zsolt....@gmail.com>
wrote:

> Sure. I ran the same job with fewer columns, the exception:
>
> java.lang.IllegalArgumentException: requirement failed: DataFrame must have 
> the same schema as the relation to which is inserted.
> DataFrame schema: StructType(StructField(pixel0,ByteType,true), 
> StructField(pixel1,ByteType,true), StructField(pixel10,ByteType,true), 
> StructField(pixel100,ShortType,true), StructField(pixel101,ShortType,true), 
> StructField(pixel102,ShortType,true), StructField(pixel103,ShortType,true), 
> StructField(pixel105,ShortType,true), StructField(pixel106,ShortType,true), 
> StructField(id,DoubleType,true), StructField(label,ByteType,true), 
> StructField(predict,DoubleType,true))
> Relation schema: StructType(StructField(pixel0,ByteType,true), 
> StructField(pixel1,ByteType,true), StructField(pixel10,ByteType,true), 
> StructField(pixel100,ShortType,true), StructField(pixel101,ShortType,true), 
> StructField(pixel102,ShortType,true), StructField(pixel103,ShortType,true), 
> StructField(pixel105,ShortType,true), StructField(pixel106,ShortType,true), 
> StructField(id,DoubleType,true), StructField(label,ByteType,true), 
> StructField(predict,DoubleType,true))
>
>       at scala.Predef$.require(Predef.scala:233)
>       at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:113)
>       at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
>       at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
>       at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
>       at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
>       at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
>       at 
> org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:69)
>       at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:140)
>       at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:138)
>       at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>       at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:138)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:933)
>       at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:933)
>       at 
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:197)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:146)
>       at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:137)
>       at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:304)
>
> Regards,
>
> Zsolt
>
>
> 2016-02-12 13:11 GMT+01:00 Ted Yu <yuzhih...@gmail.com>:
>
>> Can you pastebin the full error with all column types ?
>>
>> There should be a difference between some column(s).
>>
>> Cheers
>>
>> > On Feb 11, 2016, at 2:12 AM, Zsolt Tóth <toth.zsolt....@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I'd like to append a column of a dataframe to another DF (using Spark
>> 1.5.2):
>> >
>> > DataFrame outputDF = unlabelledDF.withColumn("predicted_label",
>> predictedDF.col("predicted"));
>> >
>> > I get the following exception:
>> >
>> > java.lang.IllegalArgumentException: requirement failed: DataFrame must
>> have the same schema as the relation to which is inserted.
>> > DataFrame schema:
>> StructType(StructField(predicted_label,DoubleType,true), ...<other 700
>> numerical (ByteType/ShortType) columns>
>> > Relation schema:
>> StructType(StructField(predicted_label,DoubleType,true), ...<the same 700
>> columns>
>> >
>> > The interesting part is that the two schemas in the exception are
>> exactly the same.
>> > The same code with other input data (with fewer, both numerical and
>> non-numerical column) succeeds.
>> > Any idea why this happens?
>> >
>>
>
>

Re: Inserting column to DataFrame

Reply via email to