Got it to work, thanks
On Sun, 20 Dec 2015 at 17:01 Eran Witkon <eranwit...@gmail.com> wrote:

> I might be missing you point but I don't get it.
> My understanding is that I need a RDD containing Rows but how do I get it?
>
> I started with a DataFrame
> run a map on it and got the RDD [string,string,string,strng] not I want to
> convert it back to a DataFrame and failing....
>
> Why?
>
>
> On Sun, Dec 20, 2015 at 4:49 PM Ted Yu <yuzhih...@gmail.com> wrote:
>
>> See the comment for createDataFrame(rowRDD: RDD[Row], schema: StructType)
>> method:
>>
>>    * Creates a [[DataFrame]] from an [[RDD]] containing [[Row]]s using
>> the given schema.
>>    * It is important to make sure that the structure of every [[Row]] of
>> the provided RDD matches
>>    * the provided schema. Otherwise, there will be runtime exception.
>>    * Example:
>>    * {{{
>>    *  import org.apache.spark.sql._
>>    *  import org.apache.spark.sql.types._
>>    *  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>    *
>>    *  val schema =
>>    *    StructType(
>>    *      StructField("name", StringType, false) ::
>>    *      StructField("age", IntegerType, true) :: Nil)
>>    *
>>    *  val people =
>>    *    sc.textFile("examples/src/main/resources/people.txt").map(
>>    *      _.split(",")).map(p => Row(p(0), p(1).trim.toInt))
>>    *  val dataFrame = sqlContext.createDataFrame(people, schema)
>>    *  dataFrame.printSchema
>>    *  // root
>>    *  // |-- name: string (nullable = false)
>>    *  // |-- age: integer (nullable = true)
>>
>> Cheers
>>
>> On Sun, Dec 20, 2015 at 6:31 AM, Eran Witkon <eranwit...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have an RDD
>>> jsonGzip
>>> res3: org.apache.spark.rdd.RDD[(String, String, String, String)] =
>>> MapPartitionsRDD[8] at map at <console>:65
>>>
>>> which I want to convert to a DataFrame with schema
>>> so I created a schema:
>>>
>>> al schema =
>>>   StructType(
>>>     StructField("cty", StringType, false) ::
>>>       StructField("hse", StringType, false) ::
>>>         StructField("nm", StringType, false) ::
>>>           StructField("yrs", StringType, false) ::Nil)
>>>
>>> and called
>>>
>>> val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
>>> <console>:36: error: overloaded method value createDataFrame with 
>>> alternatives:
>>>   (rdd: org.apache.spark.api.java.JavaRDD[_],beanClass: 
>>> Class[_])org.apache.spark.sql.DataFrame <and>
>>>   (rdd: org.apache.spark.rdd.RDD[_],beanClass: 
>>> Class[_])org.apache.spark.sql.DataFrame <and>
>>>   (rowRDD: 
>>> org.apache.spark.api.java.JavaRDD[org.apache.spark.sql.Row],schema: 
>>> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame <and>
>>>   (rowRDD: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row],schema: 
>>> org.apache.spark.sql.types.StructType)org.apache.spark.sql.DataFrame
>>>  cannot be applied to (org.apache.spark.rdd.RDD[(String, String, String, 
>>> String)], org.apache.spark.sql.types.StructType)
>>>        val unzipJSON = sqlContext.createDataFrame(jsonGzip,schema)
>>>
>>>
>>> But as you see I don't have the right RDD type.
>>>
>>> So how cane I get the a dataframe with the right column names?
>>>
>>>
>>>
>>

Reply via email to