Hmm, I got the same error with the master. Here is another test example
that fails. Here, I explicitly create
a Row RDD which corresponds to the use case I am in :
*object TestDataFrame { def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("TestDataFrame").setMaster("local[4]")
val sc = new SparkContext(conf) val sqlContext = new
SQLContext(sc)*
* import sqlContext.implicits._*
* val data = Seq(LabeledPoint(1, Vectors.zeros(10))) val dataDF
= sc.parallelize(data).toDF dataDF.printSchema()
dataDF.save("test1.parquet") // OK val dataRow = data.map {case
LabeledPoint(l: Double, f: mllib.linalg.Vector)=> Row(l,f) }
val dataRowRDD = sc.parallelize(dataRow) val dataDF2 =
sqlContext.createDataFrame(dataRowRDD, dataDF.schema)
dataDF2.printSchema() dataDF2.saveAsParquetFile("test3.parquet") //
FAIL !!! }}*
On Tue, Mar 31, 2015 at 11:18 PM, Xiangrui Meng <[email protected]> wrote:
> I cannot reproduce this error on master, but I'm not aware of any
> recent bug fixes that are related. Could you build and try the current
> master? -Xiangrui
>
> On Tue, Mar 31, 2015 at 4:10 AM, Jaonary Rabarisoa <[email protected]>
> wrote:
> > Hi all,
> >
> > DataFrame with an user defined type (here mllib.Vector) created with
> > sqlContex.createDataFrame can't be saved to parquet file and raise
> > ClassCastException: org.apache.spark.mllib.linalg.DenseVector cannot be
> cast
> > to org.apache.spark.sql.Row error.
> >
> > Here is an example of code to reproduce this error :
> >
> > object TestDataFrame {
> >
> > def main(args: Array[String]): Unit = {
> > //System.loadLibrary(Core.NATIVE_LIBRARY_NAME)
> > val conf = new
> > SparkConf().setAppName("RankingEval").setMaster("local[8]")
> > .set("spark.executor.memory", "6g")
> >
> > val sc = new SparkContext(conf)
> > val sqlContext = new SQLContext(sc)
> >
> > import sqlContext.implicits._
> >
> > val data = sc.parallelize(Seq(LabeledPoint(1, Vectors.zeros(10))))
> > val dataDF = data.toDF
> >
> > dataDF.save("test1.parquet")
> >
> > val dataDF2 = sqlContext.createDataFrame(dataDF.rdd, dataDF.schema)
> >
> > dataDF2.save("test2.parquet")
> > }
> > }
> >
> >
> > Is this related to https://issues.apache.org/jira/browse/SPARK-5532 and
> how
> > can it be solved ?
> >
> >
> > Cheers,
> >
> >
> > Jao
>