Re: spark 1.5, ML Pipeline Decision Tree Dataframe Problem

Yasemin Kaya Fri, 18 Sep 2015 12:08:48 -0700

Thanks, I try to make but i can't.
JavaPairRDD<String, Vector> unlabeledTest, the vector is Dence vector. I
add import org.apache.spark.sql.SQLContext.implicits$   but there is no
method toDf(), I am using Java not Scala.


2015-09-18 20:02 GMT+03:00 Feynman Liang <fli...@databricks.com>:

> What is the type of unlabeledTest?
>
> SQL should be using the VectorUDT we've defined for Vectors
> <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala#L183>
>  so
> you should be able to just "import sqlContext.implicits._" and then call
> "rdd.toDf()" on your RDD to convert it into a dataframe.
>
> On Fri, Sep 18, 2015 at 7:32 AM, Yasemin Kaya <godo...@gmail.com> wrote:
>
>> Hi,
>>
>> I am using *spark 1.5, ML Pipeline Decision Tree
>> <http://spark.apache.org/docs/latest/ml-decision-tree.html#output-columns>*
>> to get tree's probability. But I have to convert my data to Dataframe type.
>> While creating model there is no problem but when I am using model on my
>> data there is a problem about converting to data frame type. My data type
>> is *JavaPairRDD<String, Vector>* , when I am creating dataframe
>>
>> DataFrame production = sqlContext.createDataFrame(
>> unlabeledTest.values(), Vector.class);
>>
>> *Error says me: *
>> Exception in thread "main" java.lang.ClassCastException:
>> org.apache.spark.mllib.linalg.VectorUDT cannot be cast to
>> org.apache.spark.sql.types.StructType
>>
>> I know if I give LabeledPoint type, there will be no problem. But the
>> data have no label, I wanna predict the label because of this reason I use
>> model on it.
>>
>> Is there way to handle my problem?
>> Thanks.
>>
>>
>> Best,
>> yasemin
>> --
>> hiç ender hiç
>>
>
>


-- 
hiç ender hiç

Re: spark 1.5, ML Pipeline Decision Tree Dataframe Problem

Reply via email to