hvanhovell commented on code in PR #40610:
URL: https://github.com/apache/spark/pull/40610#discussion_r1153980266
##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala:
##########
@@ -134,24 +134,41 @@ private[sql] class SparkResult[T](
/**
* Returns an iterator over the contents of the result.
*/
- def iterator: java.util.Iterator[T] with AutoCloseable = {
+ def iterator: java.util.Iterator[T] with AutoCloseable =
+ buildIterator(destructive = false)
+
+ /**
+ * Returns an destructive iterator over the contents of the result.
+ */
+ def destructiveIterator: java.util.Iterator[T] with AutoCloseable =
+ buildIterator(destructive = true)
+
+ private def buildIterator(destructive: Boolean): java.util.Iterator[T] with
AutoCloseable = {
new java.util.Iterator[T] with AutoCloseable {
- private[this] var batchIndex: Int = -1
private[this] var iterator: java.util.Iterator[InternalRow] =
Collections.emptyIterator()
private[this] var deserializer: Deserializer[T] = _
+ private[this] var currentBatch: ColumnarBatch = _
+ private[this] val _destructive: Boolean = destructive
+
override def hasNext: Boolean = {
if (iterator.hasNext) {
return true
}
- val nextBatchIndex = batchIndex + 1
+ val batchIndex = batches.indexOf(currentBatch)
Review Comment:
I have been looking at this a for a bit now. I am not sure if I like it.
There are two issues:
- In destructive mode you know the location of the current batch. It should
be at index = 0. In non destructive mode the index should be `batchIndex`. We
are not doing anything with that information.
- The removal can be pretty expensive since we are removing from the head.
I am wondering if we can use a better suited data structure here. You could
use a map, since that will give you cheap removals, and fairly fast lookups.
Alternatively we could implement something a-kin to a linkedlist (I don't think
you can use a stock linked list since those don't like updates during
iteration).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]