Hi, thanks for the answers. If joining the DataFrames is the solution, then why does the simple withColumn() succeed for some datasets and fail for others?
2016-02-11 11:53 GMT+01:00 Michał Zieliński <zielinski.mich...@gmail.com>: > I think a good idea would be to do a join: > > outputDF = unlabelledDF.join(predictedDF.select(“id”,”predicted”),”id”) > > On 11 February 2016 at 10:12, Zsolt Tóth <toth.zsolt....@gmail.com> wrote: > >> Hi, >> >> I'd like to append a column of a dataframe to another DF (using Spark >> 1.5.2): >> >> DataFrame outputDF = unlabelledDF.withColumn("predicted_label", >> predictedDF.col("predicted")); >> >> I get the following exception: >> >> java.lang.IllegalArgumentException: requirement failed: DataFrame must >> have the same schema as the relation to which is inserted. >> DataFrame schema: >> StructType(StructField(predicted_label,DoubleType,true), ...<other 700 >> numerical (ByteType/ShortType) columns> >> Relation schema: StructType(StructField(predicted_label,DoubleType,true), >> ...<the same 700 columns> >> >> The interesting part is that the two schemas in the exception are exactly >> the same. >> The same code with other input data (with fewer, both numerical and >> non-numerical column) succeeds. >> Any idea why this happens? >> >> >