Re: From DataFrame to LabeledPoint

Joseph Bradley Mon, 06 Apr 2015 10:25:46 -0700

I'd make sure you're selecting the correct columns.  If not that, then your
input data might be corrupt.


CCing user to keep it on the user list.

On Mon, Apr 6, 2015 at 6:53 AM, Sergio Jiménez Barrio <drarse.a...@gmail.com
> wrote:

> Hi!,
>
> I had tried your solution, and I saw that the first row is null. This is
> important? Can I work with null rows? Some rows have some columns with null
> values.
>
> This is the first row of Dataframe:
> scala> dataDF.take(1)
> res11: Array[org.apache.spark.sql.Row] =
> Array([null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null])
>
>
>
> This is the RDD[LabeledPoint] created:
> scala> data.take(1)
> 15/04/06 15:46:31 ERROR TaskSetManager: Task 0 in stage 6.0 failed 4
> times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
> in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 6.0 (TID 243, 10.101.5.194): java.lang.NullPointerException
>
> Thank's for all.
>
> Sergio J.
>
> 2015-04-03 20:14 GMT+02:00 Joseph Bradley <jos...@databricks.com>:
>
>> I'd recommend going through each step, taking 1 RDD element
>> ("myDataFrame.take(1)"), and examining it to see where this issue is
>> happening.
>>
>> On Fri, Apr 3, 2015 at 9:44 AM, Sergio Jiménez Barrio <
>> drarse.a...@gmail.com> wrote:
>>
>>> This solution its really good. But I was working with
>>> feature.toString.toDouble because the feature is the type Any. Now, when I
>>> try to work with the LabeledPoint created I have a NullPointerException =/
>>> El 02/04/2015 21:23, "Joseph Bradley" <jos...@databricks.com> escribió:
>>>
>>>> Peter's suggestion sounds good, but watch out for the match case since
>>>> I believe you'll have to match on:
>>>>
>>>> case (Row(feature1, feature2, ...), Row(label)) =>
>>>>
>>>> On Thu, Apr 2, 2015 at 7:57 AM, Peter Rudenko <petro.rude...@gmail.com>
>>>> wrote:
>>>>
>>>>>  Hi try next code:
>>>>>
>>>>> val labeledPoints: RDD[LabeledPoint] = features.zip(labels).map{
>>>>>     case Row(feture1, feture2,..., label) => LabeledPoint(label, 
>>>>> Vectors.dense(feature1, feature2, ...))
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> Peter Rudenko
>>>>>
>>>>> On 2015-04-02 17:17, drarse wrote:
>>>>>
>>>>>   Hello!,
>>>>>
>>>>> I have a questions since days ago. I am working with DataFrame and with
>>>>> Spark SQL I imported a jsonFile:
>>>>>
>>>>> /val df = sqlContext.jsonFile("file.json")/
>>>>>
>>>>> In this json I have the label and de features. I selected it:
>>>>>
>>>>> /
>>>>> val features = df.select ("feature1","feature2","feature3",...);
>>>>>
>>>>> val labels = df.select ("cassification")/
>>>>>
>>>>> But, now, I don't know create a LabeledPoint for RandomForest. I tried 
>>>>> some
>>>>> solutions without success. Can you help me?
>>>>>
>>>>> Thanks for all!
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: 
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/From-DataFrame-to-LabeledPoint-tp22354.html
>>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>   
>>>>>
>>>>
>>>>
>>
>

Re: From DataFrame to LabeledPoint

Reply via email to