I think there is a minor error here in that the first example needs a
"tail" after the seq:

df.map { row =>
  (row.getDouble(0), row.toSeq.tail.map(_.asInstanceOf[Double]))
}.toDataFrame("label", "features")

On Wed, Feb 11, 2015 at 7:46 PM, Michael Armbrust
<mich...@databricks.com> wrote:
> It sounds like you probably want to do a standard Spark map, that results in
> a tuple with the structure you are looking for.  You can then just assign
> names to turn it back into a dataframe.
>
> Assuming the first column is your label and the rest are features you can do
> something like this:
>
> val df = sc.parallelize(
>   (1.0, 2.3, 2.4) ::
>   (1.2, 3.4, 1.2) ::
>   (1.2, 2.3, 1.2) :: Nil).toDataFrame("a", "b", "c")
>
> df.map { row =>
>   (row.getDouble(0), row.toSeq.map(_.asInstanceOf[Double]))
> }.toDataFrame("label", "features")
>
> df: org.apache.spark.sql.DataFrame = [label: double, features:
> array<double>]
>
> If you'd prefer to stick closer to SQL you can define a UDF:
>
> val createArray = udf((a: Double, b: Double) => Seq(a, b))
> df.select('a as 'label, createArray('b,'c) as 'features)
>
> df: org.apache.spark.sql.DataFrame = [label: double, features:
> array<double>]
>
> We'll add createArray as a first class member of the DSL.
>
> Michael
>
> On Wed, Feb 11, 2015 at 6:37 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:
>>
>> Hey All,
>>
>> I've been playing around with the new DataFrame and ML pipelines APIs and
>> am having trouble accomplishing what seems like should be a fairly basic
>> task.
>>
>> I have a DataFrame where each column is a Double.  I'd like to turn this
>> into a DataFrame with a features column and a label column that I can feed
>> into a regression.
>>
>> So far all the paths I've gone down have led me to internal APIs or
>> convoluted casting in and out of RDD[Row] and DataFrame.  Is there a simple
>> way of accomplishing this?
>>
>> any assistance (lookin' at you Xiangrui) much appreciated,
>> Sandy
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to