Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?

Marco Mistroni Sun, 25 Sep 2016 05:25:53 -0700

Hi

 i must admit , i had issues as well in finding a  sample that does that,
(hopefully Spark folks can add more examples or someone on the list can
post a sample code?)


hopefully you can reuse sample below
So,  you start from an rdd of doubles (myRdd)

## make a row
val toRddOfRows = myRdd.map(doubleValues => Row.fromSeq(doubleValues)

# then you can either call toDF directly. spk will build a schema for
you..beware you will need to import   import
org.apache.spark.sql.SQLImplicits

val df = toRddOfRows.toDF()

# or you can create a schema  yourself
def createSchema(row: Row) = {
    val first = row.toSeq
    val firstWithIdx = first.zipWithIndex
    val fields = firstWithIdx.map(tpl => StructField("Col" + tpl._2,
DoubleType, false))
    StructType(fields)

  }

val mySchema =  createSchema(toRddOfRow.first())

// returning DataFrame
val mydf =   sqlContext.createDataFrame(toRddOfRow, schema)


hth





U need to define a schema to make a df out of your list... check spark docs
on how to make a df or some machine learning examples

On 25 Sep 2016 12:57 pm, "Dan Bikle" <bikle...@gmail.com> wrote:

> Hello World,
>
> I am familiar with Python and I am learning Spark-Scala.
>
> I want to build a DataFrame which has structure desribed by this syntax:
>
>
>
>
>
>
>
>
> *// Prepare training data from a list of (label, features) tuples.val
> training = spark.createDataFrame(Seq(  (1.1, Vectors.dense(1.1, 0.1)),
> (0.2, Vectors.dense(1.0, -1.0)),  (3.0, Vectors.dense(1.3, 1.0)),  (1.0,
> Vectors.dense(1.2, -0.5)))).toDF("label", "features")*
> I got the above syntax from this URL:
>
> http://spark.apache.org/docs/latest/ml-pipeline.html
>
> Currently my data is in array which I had pulled out of a DF:
>
>
> *val my_a = gspc17_df.collect().map{row =>
> Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}*
> The structure of my array is very similar to the above DF:
>
>
>
>
>
>
> *my_a: Array[Seq[Any]] =Array(  List(-1.4830674013266898,
> [-0.004192832940431825,-0.003170667657263393]),  List(-0.05876766500768526,
> [-0.008462913654529357,-0.006880595828929472]),  List(1.0109273250546658,
> [-3.1816797620416693E-4,-0.006502619326182358]))*
> How to copy data from my array into a DataFrame which has the above
> structure?
>
> I tried this syntax:
>
>
> *val my_df = spark.createDataFrame(my_a).toDF("label","features")*
> Spark barked at me:
>
>
>
>
>
>
>
>
>
>
> *<console>:105: error: inferred type arguments [Seq[Any]] do not conform
> to method createDataFrame's type parameter bounds [A <: Product]       val
> my_df =
> spark.createDataFrame(my_a).toDF("label","features")
> ^<console>:105: error: type mismatch; found   :
> scala.collection.mutable.WrappedArray[Seq[Any]] required: Seq[A]       val
> my_df =
> spark.createDataFrame(my_a).toDF("label","features")
> ^scala> *
>

Re: In Spark-Scala, how to copy Array of Lists into new DataFrame?

Reply via email to