Hi i must admit , i had issues as well in finding a sample that does that, (hopefully Spark folks can add more examples or someone on the list can post a sample code?)
hopefully you can reuse sample below So, you start from an rdd of doubles (myRdd) ## make a row val toRddOfRows = myRdd.map(doubleValues => Row.fromSeq(doubleValues) # then you can either call toDF directly. spk will build a schema for you..beware you will need to import import org.apache.spark.sql.SQLImplicits val df = toRddOfRows.toDF() # or you can create a schema yourself def createSchema(row: Row) = { val first = row.toSeq val firstWithIdx = first.zipWithIndex val fields = firstWithIdx.map(tpl => StructField("Col" + tpl._2, DoubleType, false)) StructType(fields) } val mySchema = createSchema(toRddOfRow.first()) // returning DataFrame val mydf = sqlContext.createDataFrame(toRddOfRow, schema) hth U need to define a schema to make a df out of your list... check spark docs on how to make a df or some machine learning examples On 25 Sep 2016 12:57 pm, "Dan Bikle" <bikle...@gmail.com> wrote: > Hello World, > > I am familiar with Python and I am learning Spark-Scala. > > I want to build a DataFrame which has structure desribed by this syntax: > > > > > > > > > *// Prepare training data from a list of (label, features) tuples.val > training = spark.createDataFrame(Seq( (1.1, Vectors.dense(1.1, 0.1)), > (0.2, Vectors.dense(1.0, -1.0)), (3.0, Vectors.dense(1.3, 1.0)), (1.0, > Vectors.dense(1.2, -0.5)))).toDF("label", "features")* > I got the above syntax from this URL: > > http://spark.apache.org/docs/latest/ml-pipeline.html > > Currently my data is in array which I had pulled out of a DF: > > > *val my_a = gspc17_df.collect().map{row => > Seq(row(2),Vectors.dense(row(3).asInstanceOf[Double],row(4).asInstanceOf[Double]))}* > The structure of my array is very similar to the above DF: > > > > > > > *my_a: Array[Seq[Any]] =Array( List(-1.4830674013266898, > [-0.004192832940431825,-0.003170667657263393]), List(-0.05876766500768526, > [-0.008462913654529357,-0.006880595828929472]), List(1.0109273250546658, > [-3.1816797620416693E-4,-0.006502619326182358]))* > How to copy data from my array into a DataFrame which has the above > structure? > > I tried this syntax: > > > *val my_df = spark.createDataFrame(my_a).toDF("label","features")* > Spark barked at me: > > > > > > > > > > > *<console>:105: error: inferred type arguments [Seq[Any]] do not conform > to method createDataFrame's type parameter bounds [A <: Product] val > my_df = > spark.createDataFrame(my_a).toDF("label","features") > ^<console>:105: error: type mismatch; found : > scala.collection.mutable.WrappedArray[Seq[Any]] required: Seq[A] val > my_df = > spark.createDataFrame(my_a).toDF("label","features") > ^scala> * >