Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21385#discussion_r190120836
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
---
@@ -56,20 +69,73 @@ private[shuffle] class UnsafeRowReceiver(
override def receiveAndReply(context: RpcCallContext):
PartialFunction[Any, Unit] = {
case r: UnsafeRowReceiverMessage =>
- queue.put(r)
+ queues(r.writerId).put(r)
context.reply(())
}
override def read(): Iterator[UnsafeRow] = {
new NextIterator[UnsafeRow] {
- override def getNext(): UnsafeRow = queue.take() match {
- case ReceiverRow(r) => r
- case ReceiverEpochMarker() =>
- finished = true
- null
+ // An array of flags for whether each writer ID has gotten an epoch
marker.
+ private val writerEpochMarkersReceived =
--- End diff --
The map will always contain `(writerId, true)` which value is not needed at
all, and we are only concerned about the writerId which range is 0 until
numShuffleWriters, so it might be better to consider alternative as well.
Looks like this could be also a Set with pre-initialized to 0 until
numShuffleWriters, and we can remove the element when we receive mark. If the
element is still in a set, this represents we didn't receive mark from such
writer yet.
In similar approach, it can be pre-initialized Array of Boolean with value
as true/false.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]