Hi All,

Hoping you can help:


I have created an RDD from a NOSQL database and I want to convert the RDD to a 
data frame. I have tried many options but all result in errors.

    val df = sc.couchbaseQuery(test).map(_.value).collect().foreach(println)


{"accountStatus":"AccountOpen","custId":"140034"}
{"accountStatus":"AccountOpen","custId":"140385"}
{"accountStatus":"AccountClosed","subId":"10795","custId":"139698","subStatus":"Active"}
{"accountStatus":"AccountClosed","subId":"11364","custId":"140925","subStatus":"Paused"}
{"accountStatus":"AccountOpen","subId":"10413","custId":"138842","subStatus":"Active"}
{"accountStatus":"AccountOpen","subId":"10414","custId":"138842","subStatus":"Active"}
{"accountStatus":"AccountClosed","subId":"11314","custId":"140720","subStatus":"Paused"}
{"accountStatus":"AccountOpen","custId":"139166"}
{"accountStatus":"AccountClosed","subId":"10735","custId":"139558","subStatus":"Paused"}
{"accountStatus":"AccountOpen","custId":"139575"}
df: Unit = ()
I have tried adding .toDF() to the end of my code and also creating a schema 
and using createDataFrame but receive errors. Whats the best approach to 
converting the RDD to Dataframe?

import org.apache.spark.sql.types._

// The schema is encoded in a string
val schemaString = "accountStatus subId custId subStatus"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)
//

val peopleDF = spark.createDataFrame(df,schema)

Reply via email to