Github user rayortigas commented on a diff in the pull request:
https://github.com/apache/spark/pull/5713#discussion_r40514510
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala ---
@@ -1523,6 +1523,71 @@ class DataFrame private[sql](
}
/**
+ * :: Experimental ::
+ *
+ * Returns the content of the [[DataFrame]] as an [[RDD]] of the given
type `T`, where `T` is a
+ * subtype of [[scala.Product]] (typically a case class).
+ *
+ * For example, given a case class `Food`
+ *
+ * {{{
+ * case class Food(name: String, count: Int)
+ * }}}
+ *
+ * the following example shows how a [[DataFrame]] derived from an
`RDD[Food]` can be
+ * reconstituted into another `RDD[Food]` with the same elements:
+ *
+ * {{{
+ * val rdd0 = sc.parallelize(Seq(Food("apple", 1), Food("banana", 2),
Food("cherry", 3)))
+ * val df0 = rdd0.toDF()
+ * df0.save("foods.parquet")
+ *
+ * val df1 = sqlContext.load("foods.parquet")
+ * val rdd1 = df1.toTypedRDD[Food]()
+ * // rdd0 and rdd1 should have the same elements
+ * }}}
+ *
+ * This method makes a best effort to validate, up front, i.e. before
the RDD is materialized,
+ * that `T` is compatible with this DataFrame's [[schema]] and will
throw an
+ * `IllegalArgumentException` if it isn't. Any other problems with the
schema or conversion should
+ * manifest as exceptions when materializing the RDD.
+ *
+ * `toTypedRDD` can reconstruct most but not all `T`. For example, if
`T` has a field of type
+ * `Array` whose corresponding Catalyst type is `ArrayType`,
`toTypedRDD` cannot rebuild the array
+ * because of limitations with reflection. (`toTypedRDD` can only build
`Seq` fields from Catalyst
+ * values of `ArrayType`.)
+ *
+ * This method cannot reconstruct classes defined in the Spark shell.
Before using the shell, you
+ * should compile any classes you want to use with `toTypedRDD`.
--- End diff --
This paragraph about limitations with REPL is new.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]