BryanCutler commented on a change in pull request #24650: [SPARK-27778][PYTHON]
Fix toPandas conversion of empty DataFrame with Arrow enabled
URL: https://github.com/apache/spark/pull/24650#discussion_r286269838
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -3290,12 +3290,9 @@ class Dataset[T] private[sql](
PythonRDD.serveToStream("serve-Arrow") { outputStream =>
val out = new DataOutputStream(outputStream)
val batchWriter = new ArrowBatchStreamWriter(schema, out, timeZoneId)
- val arrowBatchRdd = toArrowBatchRdd(plan)
- val numPartitions = arrowBatchRdd.partitions.length
// Batches ordered by (index of partition, batch index in that
partition) tuple
val batchOrder = ArrayBuffer.empty[(Int, Int)]
- var partitionCount = 0
// Handler to eagerly write batches to Python as they arrive,
un-ordered
def handlePartitionBatches(index: Int, arrowBatches:
Array[Array[Byte]]): Unit = {
Review comment:
Would you mind changing this to `val handlePartitionBatches = (index: Int,
arrowBatches: Array[Array[Byte]])` and then remove the underscore when using it
below?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]