wankunde commented on pull request #32866:
URL: https://github.com/apache/spark/pull/32866#issuecomment-932975741


   Hi, @Ngone51 @LuciferYang 
   
   In our prod environment, some executors failed to kill tasks.  Could you 
give me some help?
   
   Reaper thread log:
   
   ```
   21/09/27 23:44:24,882 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:44:34,882 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2240871 ms
   21/09/27 23:44:34,885 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:44:44,885 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2250874 ms
   21/09/27 23:44:44,888 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:44:54,888 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2260877 ms
   21/09/27 23:44:54,891 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:45:04,891 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2270880 ms
   21/09/27 23:45:04,894 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:45:14,894 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2280883 ms
   21/09/27 23:45:14,896 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:45:24,897 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2290886 ms
   21/09/27 23:45:24,899 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:45:34,899 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2300888 ms
   21/09/27 23:45:34,902 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   21/09/27 23:45:44,902 WARN [Task reaper-1745] executor.Executor:69 : Killed 
task 768777879 is still running after 2310891 ms
   21/09/27 23:45:44,904 WARN [Task reaper-1745] executor.Executor:69 : Thread 
dump from task 768777879:
   ```
   
   Task Thread stack:
   ```sh
   "Executor 553 task launch worker for task 768777879, task 26.0 in stage 
1285726.0 of app application_1630907351152_13315" #1106477 daemon prio=5 
os_prio=0 tid=0x000000002a6b2000 nid=0x20b9f runnable [0x00007f87a9039000]
      java.lang.Thread.State: RUNNABLE
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering.compare_0_0$(Unknown
 Source)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering.compare(Unknown
 Source)
           at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter$KVComparator.compare(UnsafeKVExternalSorter.java:272)
           at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:70)
           at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:44)
           at 
org.apache.spark.util.collection.TimSort$SortState.gallopRight(TimSort.java:638)
           at 
org.apache.spark.util.collection.TimSort$SortState.mergeHi(TimSort.java:887)
           at 
org.apache.spark.util.collection.TimSort$SortState.mergeAt(TimSort.java:536)
           at 
org.apache.spark.util.collection.TimSort$SortState.mergeCollapse(TimSort.java:462)
           at 
org.apache.spark.util.collection.TimSort$SortState.access$200(TimSort.java:325)
           at org.apache.spark.util.collection.TimSort.sort(TimSort.java:153)
           at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
           at 
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:364)
           at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221)
           at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.createWithExistingInMemorySorter(UnsafeExternalSorter.java:111)
           at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter.<init>(UnsafeKVExternalSorter.java:158)
           at 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter(UnsafeFixedWidthAggregationMap.java:248)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doConsume_0$(Unknown
 Source)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_1$(Unknown
 Source)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_0$(Unknown
 Source)
           at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown
 Source)
           at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:50)
           at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:730)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
           at org.apache.spark.rdd.RDD$$anon$2.hasNext(RDD.scala:332)
           at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:176)
           at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
           at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
           at org.apache.spark.scheduler.Task.run(Task.scala:127)
           at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:486)
           at 
org.apache.spark.executor.Executor$TaskRunner$$Lambda$533/2066049817.apply(Unknown
 Source)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1379)
           at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:489)
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
           at java.lang.Thread.run(Thread.java:748)
   
      Locked ownable synchronizers:
           - <0x00007f8c72788150> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to