cloud-fan commented on a change in pull request #29276:
URL: https://github.com/apache/spark/pull/29276#discussion_r462013325



##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -695,7 +696,7 @@ private[spark] class TaskSetManager(
   def canFetchMoreResults(size: Long): Boolean = sched.synchronized {
     totalResultSize += size
     calculatedTasks += 1
-    if (maxResultSize > 0 && totalResultSize > maxResultSize) {
+    if (!isShuffleMapTasks && maxResultSize > 0 && totalResultSize > 
maxResultSize) {

Review comment:
       This makes sense to me, as shuffle map task returns `MapStatus`. We 
won't keep the task results, we just update map output tracker, update 
accumulators, update metrics, and throw the task result away. So we don't need 
to sum the tasks result size and check it.
   
   What do you think? @tgravescs @squito @holdenk  @jiangxb1987 @Ngone51 

##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -695,7 +696,7 @@ private[spark] class TaskSetManager(
   def canFetchMoreResults(size: Long): Boolean = sched.synchronized {
     totalResultSize += size
     calculatedTasks += 1
-    if (maxResultSize > 0 && totalResultSize > maxResultSize) {
+    if (!isShuffleMapTasks && maxResultSize > 0 && totalResultSize > 
maxResultSize) {

Review comment:
       It affects all the versions, but it's not a serious problem (users can 
set a very high max result size conf). I think we don't need to backport.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to