This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push: new 3a8398d [SPARK-26660][FOLLOWUP] Raise task serialized size warning threshold to 1000 KiB 3a8398d is described below commit 3a8398df5cf87f597e672bfbb8c6eadbad800d03 Author: Sean Owen <sean.o...@databricks.com> AuthorDate: Wed Mar 27 10:42:26 2019 +0900 [SPARK-26660][FOLLOWUP] Raise task serialized size warning threshold to 1000 KiB ## What changes were proposed in this pull request? Raise the threshold size for serialized task size at which a warning is generated from 100KiB to 1000KiB. As several people have noted, the original change for this JIRA highlighted that this threshold is low. Test output regularly shows: ``` - sorting on StringType with nullable=false, sortOrder=List('a DESC NULLS LAST) 22:47:53.320 WARN org.apache.spark.scheduler.TaskSetManager: Stage 80 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB. 22:47:53.348 WARN org.apache.spark.scheduler.TaskSetManager: Stage 81 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB. 22:47:53.417 WARN org.apache.spark.scheduler.TaskSetManager: Stage 83 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB. 22:47:53.444 WARN org.apache.spark.scheduler.TaskSetManager: Stage 84 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB. ... - SPARK-20688: correctly check analysis for scalar sub-queries 22:49:10.314 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.8 KiB - SPARK-21835: Join in correlated subquery should be duplicateResolved: case 1 22:49:10.595 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB 22:49:10.744 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB 22:49:10.894 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB - SPARK-21835: Join in correlated subquery should be duplicateResolved: case 2 - SPARK-21835: Join in correlated subquery should be duplicateResolved: case 3 - SPARK-23316: AnalysisException after max iteration reached for IN query 22:49:11.559 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 154.2 KiB ``` It seems that a larger threshold of about 1MB is more suitable. ## How was this patch tested? Existing tests. Closes #24226 from srowen/SPARK-26660.2. Authored-by: Sean Owen <sean.o...@databricks.com> Signed-off-by: Takeshi Yamamuro <yamam...@apache.org> --- core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala index ea31fe8..3977c0b 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala @@ -1111,5 +1111,5 @@ private[spark] class TaskSetManager( private[spark] object TaskSetManager { // The user will be warned if any stages contain a task that has a serialized size greater than // this. - val TASK_SIZE_TO_WARN_KIB = 100 + val TASK_SIZE_TO_WARN_KIB = 1000 } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org