Re: From DataFrame to LabeledPoint

Sergio Jiménez Barrio Tue, 07 Apr 2015 12:01:29 -0700

Solved! Thanks for ur help. I had converted null values to Double value
(0.0)
El 06/04/2015 19:25, "Joseph Bradley" <jos...@databricks.com> escribió:


> I'd make sure you're selecting the correct columns.  If not that, then
> your input data might be corrupt.
>
> CCing user to keep it on the user list.
>
> On Mon, Apr 6, 2015 at 6:53 AM, Sergio Jiménez Barrio <
> drarse.a...@gmail.com> wrote:
>
>> Hi!,
>>
>> I had tried your solution, and I saw that the first row is null. This is
>> important? Can I work with null rows? Some rows have some columns with null
>> values.
>>
>> This is the first row of Dataframe:
>> scala> dataDF.take(1)
>> res11: Array[org.apache.spark.sql.Row] =
>> Array([null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null])
>>
>>
>>
>> This is the RDD[LabeledPoint] created:
>> scala> data.take(1)
>> 15/04/06 15:46:31 ERROR TaskSetManager: Task 0 in stage 6.0 failed 4
>> times; aborting job
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>> in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>> 6.0 (TID 243, 10.101.5.194): java.lang.NullPointerException
>>
>> Thank's for all.
>>
>> Sergio J.
>>
>> 2015-04-03 20:14 GMT+02:00 Joseph Bradley <jos...@databricks.com>:
>>
>>> I'd recommend going through each step, taking 1 RDD element
>>> ("myDataFrame.take(1)"), and examining it to see where this issue is
>>> happening.
>>>
>>> On Fri, Apr 3, 2015 at 9:44 AM, Sergio Jiménez Barrio <
>>> drarse.a...@gmail.com> wrote:
>>>
>>>> This solution its really good. But I was working with
>>>> feature.toString.toDouble because the feature is the type Any. Now, when I
>>>> try to work with the LabeledPoint created I have a NullPointerException =/
>>>> El 02/04/2015 21:23, "Joseph Bradley" <jos...@databricks.com> escribió:
>>>>
>>>>> Peter's suggestion sounds good, but watch out for the match case since
>>>>> I believe you'll have to match on:
>>>>>
>>>>> case (Row(feature1, feature2, ...), Row(label)) =>
>>>>>
>>>>> On Thu, Apr 2, 2015 at 7:57 AM, Peter Rudenko <petro.rude...@gmail.com
>>>>> > wrote:
>>>>>
>>>>>>  Hi try next code:
>>>>>>
>>>>>> val labeledPoints: RDD[LabeledPoint] = features.zip(labels).map{
>>>>>>     case Row(feture1, feture2,..., label) => LabeledPoint(label, 
>>>>>> Vectors.dense(feature1, feature2, ...))
>>>>>> }
>>>>>>
>>>>>> Thanks,
>>>>>> Peter Rudenko
>>>>>>
>>>>>> On 2015-04-02 17:17, drarse wrote:
>>>>>>
>>>>>>   Hello!,
>>>>>>
>>>>>> I have a questions since days ago. I am working with DataFrame and with
>>>>>> Spark SQL I imported a jsonFile:
>>>>>>
>>>>>> /val df = sqlContext.jsonFile("file.json")/
>>>>>>
>>>>>> In this json I have the label and de features. I selected it:
>>>>>>
>>>>>> /
>>>>>> val features = df.select ("feature1","feature2","feature3",...);
>>>>>>
>>>>>> val labels = df.select ("cassification")/
>>>>>>
>>>>>> But, now, I don't know create a LabeledPoint for RandomForest. I tried 
>>>>>> some
>>>>>> solutions without success. Can you help me?
>>>>>>
>>>>>> Thanks for all!
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context: 
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/From-DataFrame-to-LabeledPoint-tp22354.html
>>>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>   
>>>>>>
>>>>>
>>>>>
>>>
>>
>

Re: From DataFrame to LabeledPoint

Reply via email to