Crap. Hit send accidentally...
In pseudocode, assuming comma-separated input data:
scala> case class Address(street: String, city: String)
scala> case class User (name: String, address: Address)
scala> val df = sc.textFile("/path/to/stuff").
map { line =>
val array = line.split(",") // assume: "name,street,city"
User(array(0), Address(array(1), array(2)))
}.toDF()
scala> df.printSchema
root
|-- name: string (nullable = true)
|-- address: struct (nullable = true)
| |-- street: string (nullable = true)
| |-- city: string (nullable = true)
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com
On Tue, Nov 17, 2015 at 11:16 AM, Dean Wampler <[email protected]>
wrote:
> One way to do it, in the Scala API, you would use a tuple or case class
> with nested tuples or case classes and/or primitives. It works fine if you
> convert to a DataFrame, too; you can reference nested elements using dot
> notation. I think in Python it would similarly.
>
> In pseudocode, assuming comma-separated input data:
>
> case class Address(street: String, city: String)
> case class User (name: String, address: Address)
>
> sc.textFile("/path/to/stuff").
> map { line =>
> line.split(0)
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Tue, Nov 17, 2015 at 11:06 AM, fhussonnois <[email protected]>
> wrote:
>
>> Hi,
>>
>> I need to convert an rdd of RDD[User] to a DataFrame containing a single
>> column named "user". The column "user" should be a nested struct with all
>> User properties.
>>
>> How can I implement this efficiently ?
>>
>> Thank you in advance
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-nested-structure-from-RDD-tp25401.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>