Solved! Thanks for ur help. I had converted null values to Double value (0.0) El 06/04/2015 19:25, "Joseph Bradley" <jos...@databricks.com> escribió:
> I'd make sure you're selecting the correct columns. If not that, then > your input data might be corrupt. > > CCing user to keep it on the user list. > > On Mon, Apr 6, 2015 at 6:53 AM, Sergio Jiménez Barrio < > drarse.a...@gmail.com> wrote: > >> Hi!, >> >> I had tried your solution, and I saw that the first row is null. This is >> important? Can I work with null rows? Some rows have some columns with null >> values. >> >> This is the first row of Dataframe: >> scala> dataDF.take(1) >> res11: Array[org.apache.spark.sql.Row] = >> Array([null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null]) >> >> >> >> This is the RDD[LabeledPoint] created: >> scala> data.take(1) >> 15/04/06 15:46:31 ERROR TaskSetManager: Task 0 in stage 6.0 failed 4 >> times; aborting job >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage >> 6.0 (TID 243, 10.101.5.194): java.lang.NullPointerException >> >> Thank's for all. >> >> Sergio J. >> >> 2015-04-03 20:14 GMT+02:00 Joseph Bradley <jos...@databricks.com>: >> >>> I'd recommend going through each step, taking 1 RDD element >>> ("myDataFrame.take(1)"), and examining it to see where this issue is >>> happening. >>> >>> On Fri, Apr 3, 2015 at 9:44 AM, Sergio Jiménez Barrio < >>> drarse.a...@gmail.com> wrote: >>> >>>> This solution its really good. But I was working with >>>> feature.toString.toDouble because the feature is the type Any. Now, when I >>>> try to work with the LabeledPoint created I have a NullPointerException =/ >>>> El 02/04/2015 21:23, "Joseph Bradley" <jos...@databricks.com> escribió: >>>> >>>>> Peter's suggestion sounds good, but watch out for the match case since >>>>> I believe you'll have to match on: >>>>> >>>>> case (Row(feature1, feature2, ...), Row(label)) => >>>>> >>>>> On Thu, Apr 2, 2015 at 7:57 AM, Peter Rudenko <petro.rude...@gmail.com >>>>> > wrote: >>>>> >>>>>> Hi try next code: >>>>>> >>>>>> val labeledPoints: RDD[LabeledPoint] = features.zip(labels).map{ >>>>>> case Row(feture1, feture2,..., label) => LabeledPoint(label, >>>>>> Vectors.dense(feature1, feature2, ...)) >>>>>> } >>>>>> >>>>>> Thanks, >>>>>> Peter Rudenko >>>>>> >>>>>> On 2015-04-02 17:17, drarse wrote: >>>>>> >>>>>> Hello!, >>>>>> >>>>>> I have a questions since days ago. I am working with DataFrame and with >>>>>> Spark SQL I imported a jsonFile: >>>>>> >>>>>> /val df = sqlContext.jsonFile("file.json")/ >>>>>> >>>>>> In this json I have the label and de features. I selected it: >>>>>> >>>>>> / >>>>>> val features = df.select ("feature1","feature2","feature3",...); >>>>>> >>>>>> val labels = df.select ("cassification")/ >>>>>> >>>>>> But, now, I don't know create a LabeledPoint for RandomForest. I tried >>>>>> some >>>>>> solutions without success. Can you help me? >>>>>> >>>>>> Thanks for all! >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/From-DataFrame-to-LabeledPoint-tp22354.html >>>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>> >> >