Crap. Hit send accidentally...

In pseudocode, assuming comma-separated input data:

scala> case class Address(street: String, city: String)
scala> case class User (name: String, address: Address)

scala> val df = sc.textFile("/path/to/stuff").
  map { line =>
    val array = line.split(",")   // assume: "name,street,city"
    User(array(0), Address(array(1), array(2)))
  }.toDF()

scala> df.printSchema
root
 |-- name: string (nullable = true)
 |-- address: struct (nullable = true)
 |    |-- street: string (nullable = true)
 |    |-- city: string (nullable = true)


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Tue, Nov 17, 2015 at 11:16 AM, Dean Wampler <[email protected]>
wrote:

> One way to do it, in the Scala API, you would use a tuple or case class
> with nested tuples or case classes and/or primitives. It works fine if you
> convert to a DataFrame, too; you can reference nested elements using dot
> notation. I think in Python it would similarly.
>
> In pseudocode, assuming comma-separated input data:
>
> case class Address(street: String, city: String)
> case class User (name: String, address: Address)
>
> sc.textFile("/path/to/stuff").
>   map { line =>
> line.split(0)
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Tue, Nov 17, 2015 at 11:06 AM, fhussonnois <[email protected]>
> wrote:
>
>> Hi,
>>
>> I need to convert an rdd of RDD[User] to a DataFrame containing a single
>> column named "user". The column "user" should be a nested struct with all
>> User properties.
>>
>> How can I implement this efficiently ?
>>
>> Thank you in advance
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-nested-structure-from-RDD-tp25401.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Reply via email to