looks like you have RDD of JSON. Try this: http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets <http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets>
Mohit Jaggi Founder, Data Orchard LLC www.dataorchardllc.com > On Nov 28, 2016, at 9:49 AM, Jun Kim <i2r....@gmail.com> wrote: > > Hi, Mark! > > Which kind of error message do you get? > > The simplest way to convert RDD to DF is just importing implicits and use toDF > > import spark.implicits._ > val df = rdd.toDF > > :-) > > 2016년 11월 29일 (화) 오전 1:26, Mark Mikolajczak <m...@flayranalytics.co.uk > <mailto:m...@flayranalytics.co.uk>>님이 작성: > > > Hi All, > > Hoping you can help: > > > I have created an RDD from a NOSQL database and I want to convert the RDD to > a data frame. I have tried many options but all result in errors. > > val df = sc.couchbaseQuery(test).map(_.value).collect().foreach(println) > > > {"accountStatus":"AccountOpen","custId":"140034"} > {"accountStatus":"AccountOpen","custId":"140385"} > {"accountStatus":"AccountClosed","subId":"10795","custId":"139698","subStatus":"Active"} > {"accountStatus":"AccountClosed","subId":"11364","custId":"140925","subStatus":"Paused"} > {"accountStatus":"AccountOpen","subId":"10413","custId":"138842","subStatus":"Active"} > {"accountStatus":"AccountOpen","subId":"10414","custId":"138842","subStatus":"Active"} > {"accountStatus":"AccountClosed","subId":"11314","custId":"140720","subStatus":"Paused"} > {"accountStatus":"AccountOpen","custId":"139166"} > {"accountStatus":"AccountClosed","subId":"10735","custId":"139558","subStatus":"Paused"} > {"accountStatus":"AccountOpen","custId":"139575"} > df: Unit = () > I have tried adding .toDF() to the end of my code and also creating a schema > and using createDataFrame but receive errors. Whats the best approach to > converting the RDD to Dataframe? > > import org.apache.spark.sql.types._ > > // The schema is encoded in a string > val schemaString = "accountStatus subId custId subStatus" > > // Generate the schema based on the string of schema > val fields = schemaString.split(" ") > .map(fieldName => StructField(fieldName, StringType, nullable = true)) > val schema = StructType(fields) > // > > val peopleDF = spark.createDataFrame(df,schema) > > -- > Taejun Kim > > Data Mining Lab. > School of Electrical and Computer Engineering > University of Seoul