One way to do it, in the Scala API, you would use a tuple or case class
with nested tuples or case classes and/or primitives. It works fine if you
convert to a DataFrame, too; you can reference nested elements using dot
notation. I think in Python it would similarly.
In pseudocode, assuming comma-separated input data:
case class Address(street: String, city: String)
case class User (name: String, address: Address)
sc.textFile("/path/to/stuff").
map { line =>
line.split(0)
dean
Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com
On Tue, Nov 17, 2015 at 11:06 AM, fhussonnois <[email protected]> wrote:
> Hi,
>
> I need to convert an rdd of RDD[User] to a DataFrame containing a single
> column named "user". The column "user" should be a nested struct with all
> User properties.
>
> How can I implement this efficiently ?
>
> Thank you in advance
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-nested-structure-from-RDD-tp25401.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>