Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/21546#discussion_r195499089
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala ---
@@ -34,17 +34,36 @@ private[sql] object PythonSQLUtils {
}
/**
- * Python Callable function to convert ArrowPayloads into a
[[DataFrame]].
+ * Python callable function to convert an RDD of serialized
ArrowRecordBatches into
+ * a [[DataFrame]].
*
- * @param payloadRDD A JavaRDD of ArrowPayloads.
- * @param schemaString JSON Formatted Schema for ArrowPayloads.
+ * @param arrowBatchRDD A JavaRDD of serialized ArrowRecordBatches.
+ * @param schemaString JSON Formatted Spark schema for Arrow batches.
* @param sqlContext The active [[SQLContext]].
* @return The converted [[DataFrame]].
*/
- def arrowPayloadToDataFrame(
- payloadRDD: JavaRDD[Array[Byte]],
+ def arrowStreamToDataFrame(
--- End diff --
it's public so it can be called in Python with Py4j
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]