Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/21370#discussion_r194784664
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3209,6 +3222,19 @@ class Dataset[T] private[sql](
}
}
+ private[sql] def getRowsToPython(
+ _numRows: Int,
+ truncate: Int,
+ vertical: Boolean): Array[Any] = {
+ EvaluatePython.registerPicklers()
+ val numRows = _numRows.max(0).min(Int.MaxValue - 1)
+ val rows = getRows(numRows, truncate, vertical).map(_.toArray).toArray
+ val toJava: (Any) => Any = EvaluatePython.toJava(_,
ArrayType(ArrayType(StringType)))
+ val iter: Iterator[Array[Byte]] = new SerDeUtil.AutoBatchedPickler(
+ rows.iterator.map(toJava))
+ PythonRDD.serveIterator(iter, "serve-GetRows")
--- End diff --
Same answer with @HyukjinKwon about the return type, and actually the exact
return type we need here is Array[Array[String]], this defined in `toJava` func.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]