Github user sethah commented on a diff in the pull request:
https://github.com/apache/spark/pull/21546#discussion_r199478745
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala ---
@@ -34,17 +33,19 @@ private[sql] object PythonSQLUtils {
}
/**
- * Python Callable function to convert ArrowPayloads into a
[[DataFrame]].
+ * Python callable function to read a file in Arrow stream format and
create a [[DataFrame]]
+ * using each serialized ArrowRecordBatch as a partition.
*
- * @param payloadRDD A JavaRDD of ArrowPayloads.
- * @param schemaString JSON Formatted Schema for ArrowPayloads.
* @param sqlContext The active [[SQLContext]].
- * @return The converted [[DataFrame]].
+ * @param filename File to read the Arrow stream from.
+ * @param schemaString JSON Formatted Spark schema for Arrow batches.
+ * @return A new [[DataFrame]].
*/
- def arrowPayloadToDataFrame(
- payloadRDD: JavaRDD[Array[Byte]],
- schemaString: String,
- sqlContext: SQLContext): DataFrame = {
- ArrowConverters.toDataFrame(payloadRDD, schemaString, sqlContext)
+ def arrowReadStreamFromFile(
--- End diff --
Can we call it `arrowFileToDataFrame` or something...
`arrowReadStreamFromFile` and `readArrowStreamFromFile` are just too similar...
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]