I think there is a minor error here in that the first example needs a "tail" after the seq:
df.map { row => (row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double])) }.toDataFrame("label", "features") On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust <mich...@databricks.com> wrote: > It sounds like you probably want to do a standard Spark map, that results in > a tuple with the structure you are looking for. You can then just assign > names to turn it back into a dataframe. > > Assuming the first column is your label and the rest are features you can do > something like this: > > val df = sc.parallelize( > (1.0, 2.3, 2.4) :: > (1.2, 3.4, 1.2) :: > (1.2, 2.3, 1.2) :: Nil).toDataFrame("a", "b", "c") > > df.map { row => > (row.getDouble(0), row.toSeq.map(_.asInstanceOf[Double])) > }.toDataFrame("label", "features") > > df: org.apache.spark.sql.DataFrame = [label: double, features: > array<double>] > > If you'd prefer to stick closer to SQL you can define a UDF: > > val createArray = udf((a: Double, b: Double) => Seq(a, b)) > df.select('a as 'label, createArray('b,'c) as 'features) > > df: org.apache.spark.sql.DataFrame = [label: double, features: > array<double>] > > We'll add createArray as a first class member of the DSL. > > Michael > > On Wed, Feb 11, 2015 at 6:37 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote: >> >> Hey All, >> >> I've been playing around with the new DataFrame and ML pipelines APIs and >> am having trouble accomplishing what seems like should be a fairly basic >> task. >> >> I have a DataFrame where each column is a Double. I'd like to turn this >> into a DataFrame with a features column and a label column that I can feed >> into a regression. >> >> So far all the paths I've gone down have led me to internal APIs or >> convoluted casting in and out of RDD[Row] and DataFrame. Is there a simple >> way of accomplishing this? >> >> any assistance (lookin' at you Xiangrui) much appreciated, >> Sandy > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org