rlodge commented on a change in pull request #23716: [SPARK-26734] [STREAMING] 
Fix StackOverflowError with large block queue
URL: https://github.com/apache/spark/pull/23716#discussion_r252897763
 
 

 ##########
 File path: 
streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala
 ##########
 @@ -112,7 +112,7 @@ private[streaming] class ReceivedBlockTracker(
   def allocateBlocksToBatch(batchTime: Time): Unit = synchronized {
     if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) {
       val streamIdToBlocks = streamIds.map { streamId =>
-        (streamId, getReceivedBlockQueue(streamId).clone())
+        (streamId, getReceivedBlockQueue(streamId).clone().dequeueAll(x => 
true))
 
 Review comment:
   Sure, I could do:
   
   `mutable.ArrayBuffer.empty[ReceivedBlockInfo] ++= 
getReceivedBlockQueue(streamId).clone()`
   
   which ensures that we know exactly the sequence type being serialized; would 
you prefer I did that?
   
   https://issues.apache.org/jira/browse/SPARK-23991 caused it in the sense 
that it moved from using dequeueAll to just serializing a clone of the queue, 
and the queue just doesn't serialize correctly after some number of entries (a 
scala bug, IMO).  You could say it "uncovered" a scala bug that was sitting 
there, but prior to that checking the code wouldn't error with large numbers of 
entries because dequeueAll constructs an ArrayBuffer and that's what was being 
serialized.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to