I reproduced the bug on master and submitted a patch for it: https://github.com/apache/spark/pull/5329. It may get into Spark 1.3.1. Thanks for reporting the bug! -Xiangrui
On Wed, Apr 1, 2015 at 12:57 AM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Hmm, I got the same error with the master. Here is another test example that > fails. Here, I explicitly create > a Row RDD which corresponds to the use case I am in : > > object TestDataFrame { > > def main(args: Array[String]): Unit = { > > val conf = new > SparkConf().setAppName("TestDataFrame").setMaster("local[4]") > val sc = new SparkContext(conf) > val sqlContext = new SQLContext(sc) > > import sqlContext.implicits._ > > val data = Seq(LabeledPoint(1, Vectors.zeros(10))) > val dataDF = sc.parallelize(data).toDF > > dataDF.printSchema() > dataDF.save("test1.parquet") // OK > > val dataRow = data.map {case LabeledPoint(l: Double, f: > mllib.linalg.Vector)=> > Row(l,f) > } > > val dataRowRDD = sc.parallelize(dataRow) > val dataDF2 = sqlContext.createDataFrame(dataRowRDD, dataDF.schema) > > dataDF2.printSchema() > > dataDF2.saveAsParquetFile("test3.parquet") // FAIL !!! > } > } > > > On Tue, Mar 31, 2015 at 11:18 PM, Xiangrui Meng <men...@gmail.com> wrote: >> >> I cannot reproduce this error on master, but I'm not aware of any >> recent bug fixes that are related. Could you build and try the current >> master? -Xiangrui >> >> On Tue, Mar 31, 2015 at 4:10 AM, Jaonary Rabarisoa <jaon...@gmail.com> >> wrote: >> > Hi all, >> > >> > DataFrame with an user defined type (here mllib.Vector) created with >> > sqlContex.createDataFrame can't be saved to parquet file and raise >> > ClassCastException: org.apache.spark.mllib.linalg.DenseVector cannot be >> > cast >> > to org.apache.spark.sql.Row error. >> > >> > Here is an example of code to reproduce this error : >> > >> > object TestDataFrame { >> > >> > def main(args: Array[String]): Unit = { >> > //System.loadLibrary(Core.NATIVE_LIBRARY_NAME) >> > val conf = new >> > SparkConf().setAppName("RankingEval").setMaster("local[8]") >> > .set("spark.executor.memory", "6g") >> > >> > val sc = new SparkContext(conf) >> > val sqlContext = new SQLContext(sc) >> > >> > import sqlContext.implicits._ >> > >> > val data = sc.parallelize(Seq(LabeledPoint(1, Vectors.zeros(10)))) >> > val dataDF = data.toDF >> > >> > dataDF.save("test1.parquet") >> > >> > val dataDF2 = sqlContext.createDataFrame(dataDF.rdd, dataDF.schema) >> > >> > dataDF2.save("test2.parquet") >> > } >> > } >> > >> > >> > Is this related to https://issues.apache.org/jira/browse/SPARK-5532 and >> > how >> > can it be solved ? >> > >> > >> > Cheers, >> > >> > >> > Jao > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org