ScrapCodes edited a comment on pull request #27735:
URL: https://github.com/apache/spark/pull/27735#issuecomment-705428013
So, now the test is flaky and not failing. As it passed the second time.
For my own exploration later - preserving the failed instance log:
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- Test basic decommissioning *** FAILED ***
The code passed to eventually never returned normally. Attempted 184 times
over 3.01608704245 minutes. Last failure message: "++ id -u
+ myuid=185
++ id -g
+ mygid=0
+ set +e
++ getent passwd 185
+ uidentry=
+ set -e
+ '[' -z '' ']'
+ '[' -w /etc/passwd ']'
+ echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' 3 == 2 ']'
+ '[' 3 == 3 ']'
++ python3 -V
+ pyv3='Python 3.7.3'
+ export PYTHON_VERSION=3.7.3
+ PYTHON_VERSION=3.7.3
+ export PYSPARK_PYTHON=python3
+ PYSPARK_PYTHON=python3
+ export PYSPARK_DRIVER_PYTHON=python3
+ PYSPARK_DRIVER_PYTHON=python3
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner
local:///opt/spark/tests/decommissioning.py
20/10/07 11:36:40 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Starting decom test
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
20/10/07 11:36:41 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
20/10/07 11:36:41 INFO ResourceUtils:
==============================================================
20/10/07 11:36:41 INFO ResourceUtils: No custom resources configured for
spark.driver.
20/10/07 11:36:41 INFO ResourceUtils:
==============================================================
20/10/07 11:36:41 INFO SparkContext: Submitted application: PyMemoryTest
20/10/07 11:36:41 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: ,
memory -> name: memory, amount: 1024, script: , vendor: ), task resources:
Map(cpus -> name: cpus, amount: 1.0)
20/10/07 11:36:41 INFO ResourceProfile: Limiting resource is cpus at 1
tasks per executor
20/10/07 11:36:41 INFO ResourceProfileManager: Added ResourceProfile id: 0
20/10/07 11:36:41 INFO SecurityManager: Changing view acls to: 185,jenkins
20/10/07 11:36:41 INFO SecurityManager: Changing modify acls to:
185,jenkins
20/10/07 11:36:41 INFO SecurityManager: Changing view acls groups to:
20/10/07 11:36:41 INFO SecurityManager: Changing modify acls groups to:
20/10/07 11:36:41 INFO SecurityManager: SecurityManager: authentication
enabled; ui acls disabled; users with view permissions: Set(185, jenkins);
groups with view permissions: Set(); users with modify permissions: Set(185,
jenkins); groups with modify permissions: Set()
20/10/07 11:36:42 INFO Utils: Successfully started service 'sparkDriver'
on port 7078.
20/10/07 11:36:42 INFO SparkEnv: Registering MapOutputTracker
20/10/07 11:36:42 INFO SparkEnv: Registering BlockManagerMaster
20/10/07 11:36:42 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/10/07 11:36:42 INFO BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
20/10/07 11:36:42 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/10/07 11:36:42 INFO DiskBlockManager: Created local directory at
/var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/blockmgr-754bcb40-809e-4163-a4e2-98af300183f6
20/10/07 11:36:42 INFO MemoryStore: MemoryStore started with capacity
593.9 MiB
20/10/07 11:36:42 INFO SparkEnv: Registering OutputCommitCoordinator
20/10/07 11:36:42 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
20/10/07 11:36:42 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:4040
20/10/07 11:36:43 INFO SparkKubernetesClientFactory: Auto-configuring K8S
client using current context from users K8S config file
20/10/07 11:36:44 INFO ExecutorPodsAllocator: Going to request 3 executors
from Kubernetes.
20/10/07 11:36:44 INFO KubernetesClientUtils: Spark configuration files
loaded from Some(/opt/spark/conf) :
20/10/07 11:36:44 INFO BasicExecutorFeatureStep: Adding decommission
script to lifecycle
20/10/07 11:36:44 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
20/10/07 11:36:44 INFO NettyBlockTransferService: Server created on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
20/10/07 11:36:44 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
20/10/07 11:36:44 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver,
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
7079, None)
20/10/07 11:36:44 INFO BlockManagerMasterEndpoint: Registering block
manager
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
with 593.9 MiB RAM, BlockManagerId(driver,
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
7079, None)
20/10/07 11:36:44 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver,
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
7079, None)
20/10/07 11:36:44 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver,
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc,
7079, None)
20/10/07 11:36:44 INFO KubernetesClientUtils: Spark configuration files
loaded from Some(/opt/spark/conf) :
20/10/07 11:36:45 INFO BasicExecutorFeatureStep: Adding decommission
script to lifecycle
20/10/07 11:36:45 INFO KubernetesClientUtils: Spark configuration files
loaded from Some(/opt/spark/conf) :
20/10/07 11:36:45 INFO BasicExecutorFeatureStep: Adding decommission
script to lifecycle
20/10/07 11:36:45 INFO KubernetesClientUtils: Spark configuration files
loaded from Some(/opt/spark/conf) :
20/10/07 11:36:50 INFO
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:33974) with ID 1,
ResourceProfileId 0
20/10/07 11:36:50 INFO
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.6:53680) with ID 2,
ResourceProfileId 0
20/10/07 11:36:50 INFO BlockManagerMasterEndpoint: Registering block
manager 172.17.0.5:39709 with 593.9 MiB RAM, BlockManagerId(1, 172.17.0.5,
39709, None)
20/10/07 11:36:50 INFO BlockManagerMasterEndpoint: Registering block
manager 172.17.0.6:39551 with 593.9 MiB RAM, BlockManagerId(2, 172.17.0.6,
39551, None)
20/10/07 11:37:14 INFO KubernetesClusterSchedulerBackend: SchedulerBackend
is ready for scheduling beginning after waiting
maxRegisteredResourcesWaitingTime: 30000000000(ns)
20/10/07 11:37:15 INFO SharedState: Setting hive.metastore.warehouse.dir
('null') to the value of spark.sql.warehouse.dir
('file:/opt/spark/work-dir/spark-warehouse').
20/10/07 11:37:15 INFO SharedState: Warehouse path is
'file:/opt/spark/work-dir/spark-warehouse'.
20/10/07 11:37:16 INFO SparkContext: Starting job: collect at
/opt/spark/tests/decommissioning.py:44
20/10/07 11:37:16 INFO DAGScheduler: Registering RDD 2 (groupByKey at
/opt/spark/tests/decommissioning.py:43) as input to shuffle 0
20/10/07 11:37:16 INFO DAGScheduler: Got job 0 (collect at
/opt/spark/tests/decommissioning.py:44) with 5 output partitions
20/10/07 11:37:16 INFO DAGScheduler: Final stage: ResultStage 1 (collect
at /opt/spark/tests/decommissioning.py:44)
20/10/07 11:37:16 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 0)
20/10/07 11:37:16 INFO DAGScheduler: Missing parents: List(ShuffleMapStage
0)
20/10/07 11:37:16 INFO DAGScheduler: Submitting ShuffleMapStage 0
(PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43), which
has no missing parents
20/10/07 11:37:17 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 10.6 KiB, free 593.9 MiB)
20/10/07 11:37:17 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 6.5 KiB, free 593.9 MiB)
20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
(size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:17 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:1348
20/10/07 11:37:17 INFO DAGScheduler: Submitting 5 missing tasks from
ShuffleMapStage 0 (PairwiseRDD[2] at groupByKey at
/opt/spark/tests/decommissioning.py:43) (first 15 tasks are for partitions
Vector(0, 1, 2, 3, 4))
20/10/07 11:37:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 5 tasks
resource profile 0
20/10/07 11:37:17 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID
0) (172.17.0.5, executor 1, partition 0, PROCESS_LOCAL, 7341 bytes)
taskResourceAssignments Map()
20/10/07 11:37:17 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID
1) (172.17.0.6, executor 2, partition 1, PROCESS_LOCAL, 7341 bytes)
taskResourceAssignments Map()
20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on 172.17.0.5:39709 (size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on 172.17.0.6:39551 (size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:19 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID
2) (172.17.0.6, executor 2, partition 2, PROCESS_LOCAL, 7341 bytes)
taskResourceAssignments Map()
20/10/07 11:37:19 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID
1) in 2093 ms on 172.17.0.6 (executor 2) (1/5)
20/10/07 11:37:19 INFO PythonAccumulatorV2: Connected to AccumulatorServer
at host: 127.0.0.1 port: 53377
20/10/07 11:37:19 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID
3) (172.17.0.5, executor 1, partition 3, PROCESS_LOCAL, 7341 bytes)
taskResourceAssignments Map()
20/10/07 11:37:19 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID
0) in 2153 ms on 172.17.0.5 (executor 1) (2/5)
20/10/07 11:37:19 INFO TaskSetManager: Starting task 4.0 in stage 0.0 (TID
4) (172.17.0.6, executor 2, partition 4, PROCESS_LOCAL, 7341 bytes)
taskResourceAssignments Map()
20/10/07 11:37:19 INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID
2) in 132 ms on 172.17.0.6 (executor 2) (3/5)
20/10/07 11:37:19 INFO TaskSetManager: Finished task 3.0 in stage 0.0 (TID
3) in 145 ms on 172.17.0.5 (executor 1) (4/5)
20/10/07 11:37:19 INFO TaskSetManager: Finished task 4.0 in stage 0.0 (TID
4) in 99 ms on 172.17.0.6 (executor 2) (5/5)
20/10/07 11:37:19 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks
have all completed, from pool
20/10/07 11:37:19 INFO DAGScheduler: ShuffleMapStage 0 (groupByKey at
/opt/spark/tests/decommissioning.py:43) finished in 2.530 s
20/10/07 11:37:19 INFO DAGScheduler: looking for newly runnable stages
20/10/07 11:37:19 INFO DAGScheduler: running: Set()
20/10/07 11:37:19 INFO DAGScheduler: waiting: Set(ResultStage 1)
20/10/07 11:37:19 INFO DAGScheduler: failed: Set()
20/10/07 11:37:19 INFO DAGScheduler: Submitting ResultStage 1
(PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44), which has
no missing parents
20/10/07 11:37:19 INFO MemoryStore: Block broadcast_1 stored as values in
memory (estimated size 9.3 KiB, free 593.9 MiB)
20/10/07 11:37:19 INFO MemoryStore: Block broadcast_1_piece0 stored as
bytes in memory (estimated size 5.4 KiB, free 593.9 MiB)
20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in
memory on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
(size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:19 INFO SparkContext: Created broadcast 1 from broadcast at
DAGScheduler.scala:1348
20/10/07 11:37:19 INFO DAGScheduler: Submitting 5 missing tasks from
ResultStage 1 (PythonRDD[5] at collect at
/opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions
Vector(0, 1, 2, 3, 4))
20/10/07 11:37:19 INFO TaskSchedulerImpl: Adding task set 1.0 with 5 tasks
resource profile 0
20/10/07 11:37:19 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID
5) (172.17.0.5, executor 1, partition 0, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:19 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID
6) (172.17.0.6, executor 2, partition 1, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in
memory on 172.17.0.5:39709 (size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:19 INFO BlockManagerInfo: Added broadcast_1_piece0 in
memory on 172.17.0.6:39551 (size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 0 to 172.17.0.5:33974
20/10/07 11:37:20 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 0 to 172.17.0.6:53680
20/10/07 11:37:20 INFO TaskSetManager: Starting task 2.0 in stage 1.0 (TID
7) (172.17.0.5, executor 1, partition 2, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID
5) in 667 ms on 172.17.0.5 (executor 1) (1/5)
20/10/07 11:37:20 INFO TaskSetManager: Starting task 3.0 in stage 1.0 (TID
8) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID
6) in 703 ms on 172.17.0.6 (executor 2) (2/5)
20/10/07 11:37:20 INFO TaskSetManager: Starting task 4.0 in stage 1.0 (TID
9) (172.17.0.5, executor 1, partition 4, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:20 INFO TaskSetManager: Finished task 2.0 in stage 1.0 (TID
7) in 87 ms on 172.17.0.5 (executor 1) (3/5)
20/10/07 11:37:20 INFO TaskSetManager: Finished task 3.0 in stage 1.0 (TID
8) in 119 ms on 172.17.0.6 (executor 2) (4/5)
20/10/07 11:37:20 INFO TaskSetManager: Finished task 4.0 in stage 1.0 (TID
9) in 117 ms on 172.17.0.5 (executor 1) (5/5)
20/10/07 11:37:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
have all completed, from pool
20/10/07 11:37:20 INFO DAGScheduler: ResultStage 1 (collect at
/opt/spark/tests/decommissioning.py:44) finished in 0.898 s
20/10/07 11:37:20 INFO DAGScheduler: Job 0 is finished. Cancelling
potential speculative or zombie tasks for this job
20/10/07 11:37:20 INFO TaskSchedulerImpl: Killing all running tasks in
stage 1: Stage finished
20/10/07 11:37:20 INFO DAGScheduler: Job 0 finished: collect at
/opt/spark/tests/decommissioning.py:44, took 3.730659 s
1st accumulator value is: 100
Waiting to give nodes time to finish migration, decom exec 1.
...
20/10/07 11:37:20 ERROR TaskSchedulerImpl: Lost executor 1 on 172.17.0.5:
The executor with id 1 was deleted by a user or the framework.
20/10/07 11:37:20 INFO DAGScheduler: Executor lost: 1 (epoch 1)
20/10/07 11:37:20 INFO BlockManagerMasterEndpoint: Trying to remove
executor 1 from BlockManagerMaster.
20/10/07 11:37:20 INFO BlockManagerMasterEndpoint: Removing block manager
BlockManagerId(1, 172.17.0.5, 39709, None)
20/10/07 11:37:20 INFO BlockManagerMaster: Removed 1 successfully in
removeExecutor
20/10/07 11:37:20 INFO DAGScheduler: Shuffle files lost for executor: 1
(epoch 1)
20/10/07 11:37:24 INFO ExecutorPodsAllocator: Going to request 1 executors
from Kubernetes.
20/10/07 11:37:24 INFO BasicExecutorFeatureStep: Adding decommission
script to lifecycle
20/10/07 11:37:24 INFO KubernetesClientUtils: Spark configuration files
loaded from Some(/opt/spark/conf) :
20/10/07 11:37:26 INFO
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.5:34196) with ID 3,
ResourceProfileId 0
20/10/07 11:37:26 INFO BlockManagerMasterEndpoint: Registering block
manager 172.17.0.5:38381 with 593.9 MiB RAM, BlockManagerId(3, 172.17.0.5,
38381, None)
20/10/07 11:37:50 INFO SparkContext: Starting job: count at
/opt/spark/tests/decommissioning.py:49
20/10/07 11:37:50 INFO DAGScheduler: Got job 1 (count at
/opt/spark/tests/decommissioning.py:49) with 5 output partitions
20/10/07 11:37:50 INFO DAGScheduler: Final stage: ResultStage 3 (count at
/opt/spark/tests/decommissioning.py:49)
20/10/07 11:37:50 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 2)
20/10/07 11:37:50 INFO DAGScheduler: Missing parents: List(ShuffleMapStage
2)
20/10/07 11:37:50 INFO DAGScheduler: Submitting ShuffleMapStage 2
(PairwiseRDD[2] at groupByKey at /opt/spark/tests/decommissioning.py:43), which
has no missing parents
20/10/07 11:37:50 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 10.6 KiB, free 593.9 MiB)
20/10/07 11:37:50 INFO MemoryStore: Block broadcast_2_piece0 stored as
bytes in memory (estimated size 6.5 KiB, free 593.9 MiB)
20/10/07 11:37:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in
memory on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
(size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:50 INFO SparkContext: Created broadcast 2 from broadcast at
DAGScheduler.scala:1348
20/10/07 11:37:50 INFO DAGScheduler: Submitting 2 missing tasks from
ShuffleMapStage 2 (PairwiseRDD[2] at groupByKey at
/opt/spark/tests/decommissioning.py:43) (first 15 tasks are for partitions
Vector(0, 3))
20/10/07 11:37:50 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
resource profile 0
20/10/07 11:37:50 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID
10) (172.17.0.5, executor 3, partition 0, PROCESS_LOCAL, 7341 bytes)
taskResourceAssignments Map()
20/10/07 11:37:50 INFO TaskSetManager: Starting task 1.0 in stage 2.0 (TID
11) (172.17.0.6, executor 2, partition 3, PROCESS_LOCAL, 7341 bytes)
taskResourceAssignments Map()
20/10/07 11:37:50 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
in memory (size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:50 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
172.17.0.6:39551 in memory (size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in
memory on 172.17.0.6:39551 (size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:50 INFO TaskSetManager: Finished task 1.0 in stage 2.0 (TID
11) in 214 ms on 172.17.0.6 (executor 2) (1/2)
20/10/07 11:37:51 INFO BlockManagerInfo: Added broadcast_2_piece0 in
memory on 172.17.0.5:38381 (size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:52 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID
10) in 1936 ms on 172.17.0.5 (executor 3) (2/2)
20/10/07 11:37:52 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool
20/10/07 11:37:52 INFO DAGScheduler: ShuffleMapStage 2 (groupByKey at
/opt/spark/tests/decommissioning.py:43) finished in 1.985 s
20/10/07 11:37:52 INFO DAGScheduler: looking for newly runnable stages
20/10/07 11:37:52 INFO DAGScheduler: running: Set()
20/10/07 11:37:52 INFO DAGScheduler: waiting: Set(ResultStage 3)
20/10/07 11:37:52 INFO DAGScheduler: failed: Set()
20/10/07 11:37:52 INFO DAGScheduler: Submitting ResultStage 3
(PythonRDD[6] at count at /opt/spark/tests/decommissioning.py:49), which has no
missing parents
20/10/07 11:37:52 INFO MemoryStore: Block broadcast_3 stored as values in
memory (estimated size 10.6 KiB, free 593.9 MiB)
20/10/07 11:37:52 INFO MemoryStore: Block broadcast_3_piece0 stored as
bytes in memory (estimated size 5.9 KiB, free 593.9 MiB)
20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in
memory on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
(size: 5.9 KiB, free: 593.9 MiB)
20/10/07 11:37:52 INFO SparkContext: Created broadcast 3 from broadcast at
DAGScheduler.scala:1348
20/10/07 11:37:52 INFO DAGScheduler: Submitting 5 missing tasks from
ResultStage 3 (PythonRDD[6] at count at /opt/spark/tests/decommissioning.py:49)
(first 15 tasks are for partitions Vector(0, 1, 2, 3, 4))
20/10/07 11:37:52 INFO TaskSchedulerImpl: Adding task set 3.0 with 5 tasks
resource profile 0
20/10/07 11:37:52 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
12) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:52 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID
13) (172.17.0.5, executor 3, partition 1, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in
memory on 172.17.0.6:39551 (size: 5.9 KiB, free: 593.9 MiB)
20/10/07 11:37:52 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 0 to 172.17.0.6:53680
20/10/07 11:37:52 INFO BlockManagerInfo: Added broadcast_3_piece0 in
memory on 172.17.0.5:38381 (size: 5.9 KiB, free: 593.9 MiB)
20/10/07 11:37:52 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID
14) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:52 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
12) in 215 ms on 172.17.0.6 (executor 2) (1/5)
20/10/07 11:37:52 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID
15) (172.17.0.6, executor 2, partition 3, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:52 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID
14) in 94 ms on 172.17.0.6 (executor 2) (2/5)
20/10/07 11:37:53 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID
15) in 93 ms on 172.17.0.6 (executor 2) (3/5)
20/10/07 11:37:53 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID
16) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:53 INFO TaskSetManager: Finished task 4.0 in stage 3.0 (TID
16) in 101 ms on 172.17.0.6 (executor 2) (4/5)
20/10/07 11:37:53 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 0 to 172.17.0.5:34196
20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
in memory (size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
172.17.0.6:39551 in memory (size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:53 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
172.17.0.5:38381 in memory (size: 6.5 KiB, free: 593.9 MiB)
20/10/07 11:37:53 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID
13) in 741 ms on 172.17.0.5 (executor 3) (5/5)
20/10/07 11:37:53 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks
have all completed, from pool
20/10/07 11:37:53 INFO DAGScheduler: ResultStage 3 (count at
/opt/spark/tests/decommissioning.py:49) finished in 0.758 s
20/10/07 11:37:53 INFO DAGScheduler: Job 1 is finished. Cancelling
potential speculative or zombie tasks for this job
20/10/07 11:37:53 INFO TaskSchedulerImpl: Killing all running tasks in
stage 3: Stage finished
20/10/07 11:37:53 INFO DAGScheduler: Job 1 finished: count at
/opt/spark/tests/decommissioning.py:49, took 2.769232 s
20/10/07 11:37:53 INFO SparkContext: Starting job: collect at
/opt/spark/tests/decommissioning.py:50
20/10/07 11:37:53 INFO DAGScheduler: Got job 2 (collect at
/opt/spark/tests/decommissioning.py:50) with 5 output partitions
20/10/07 11:37:53 INFO DAGScheduler: Final stage: ResultStage 5 (collect
at /opt/spark/tests/decommissioning.py:50)
20/10/07 11:37:53 INFO DAGScheduler: Parents of final stage:
List(ShuffleMapStage 4)
20/10/07 11:37:53 INFO DAGScheduler: Missing parents: List()
20/10/07 11:37:53 INFO DAGScheduler: Submitting ResultStage 5
(PythonRDD[5] at collect at /opt/spark/tests/decommissioning.py:44), which has
no missing parents
20/10/07 11:37:53 INFO MemoryStore: Block broadcast_4 stored as values in
memory (estimated size 9.3 KiB, free 593.9 MiB)
20/10/07 11:37:53 INFO MemoryStore: Block broadcast_4_piece0 stored as
bytes in memory (estimated size 5.4 KiB, free 593.9 MiB)
20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in
memory on
spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:7079
(size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:53 INFO SparkContext: Created broadcast 4 from broadcast at
DAGScheduler.scala:1348
20/10/07 11:37:53 INFO DAGScheduler: Submitting 5 missing tasks from
ResultStage 5 (PythonRDD[5] at collect at
/opt/spark/tests/decommissioning.py:44) (first 15 tasks are for partitions
Vector(0, 1, 2, 3, 4))
20/10/07 11:37:53 INFO TaskSchedulerImpl: Adding task set 5.0 with 5 tasks
resource profile 0
20/10/07 11:37:53 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID
17) (172.17.0.6, executor 2, partition 0, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:53 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID
18) (172.17.0.5, executor 3, partition 1, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in
memory on 172.17.0.6:39551 (size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:53 INFO BlockManagerInfo: Added broadcast_4_piece0 in
memory on 172.17.0.5:38381 (size: 5.4 KiB, free: 593.9 MiB)
20/10/07 11:37:53 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID
19) (172.17.0.6, executor 2, partition 2, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:53 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID
17) in 120 ms on 172.17.0.6 (executor 2) (1/5)
20/10/07 11:37:53 INFO TaskSetManager: Starting task 3.0 in stage 5.0 (TID
20) (172.17.0.5, executor 3, partition 3, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:53 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID
18) in 144 ms on 172.17.0.5 (executor 3) (2/5)
20/10/07 11:37:53 INFO TaskSetManager: Starting task 4.0 in stage 5.0 (TID
21) (172.17.0.6, executor 2, partition 4, NODE_LOCAL, 7162 bytes)
taskResourceAssignments Map()
20/10/07 11:37:53 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID
19) in 71 ms on 172.17.0.6 (executor 2) (3/5)
20/10/07 11:37:53 INFO TaskSetManager: Finished task 3.0 in stage 5.0 (TID
20) in 83 ms on 172.17.0.5 (executor 3) (4/5)
20/10/07 11:37:53 INFO TaskSetManager: Finished task 4.0 in stage 5.0 (TID
21) in 71 ms on 172.17.0.6 (executor 2) (5/5)
20/10/07 11:37:53 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks
have all completed, from pool
20/10/07 11:37:53 INFO DAGScheduler: ResultStage 5 (collect at
/opt/spark/tests/decommissioning.py:50) finished in 0.276 s
20/10/07 11:37:53 INFO DAGScheduler: Job 2 is finished. Cancelling
potential speculative or zombie tasks for this job
20/10/07 11:37:53 INFO TaskSchedulerImpl: Killing all running tasks in
stage 5: Stage finished
20/10/07 11:37:53 INFO DAGScheduler: Job 2 finished: collect at
/opt/spark/tests/decommissioning.py:50, took 0.286559 s
Final accumulator value is: 140
Finished waiting, stopping Spark.
20/10/07 11:37:53 INFO SparkUI: Stopped Spark web UI at
http://spark-test-app-7c67727502d94a33-driver-svc.eca135fa120b47959f2646646558f6e4.svc:4040
20/10/07 11:37:53 INFO KubernetesClusterSchedulerBackend: Shutting down
all executors
20/10/07 11:37:53 INFO
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each
executor to shut down
20/10/07 11:37:53 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client
has been closed (this is expected if the application is shutting down.)
20/10/07 11:37:53 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
20/10/07 11:37:53 INFO MemoryStore: MemoryStore cleared
20/10/07 11:37:53 INFO BlockManager: BlockManager stopped
20/10/07 11:37:53 INFO BlockManagerMaster: BlockManagerMaster stopped
20/10/07 11:37:53 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
20/10/07 11:37:53 INFO SparkContext: Successfully stopped SparkContext
Done, exiting Python
20/10/07 11:37:54 INFO ShutdownHookManager: Shutdown hook called
20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory
/var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/spark-6d0323a5-e6f6-4c3d-8851-149cb0110338
20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory
/var/data/spark-d602c302-e5f2-44eb-9398-587e9ec828f3/spark-6d0323a5-e6f6-4c3d-8851-149cb0110338/pyspark-264151ef-fe00-4cc8-a175-ed2892c8f74f
20/10/07 11:37:54 INFO ShutdownHookManager: Deleting directory
/tmp/spark-90eaed58-3856-4e81-9460-473673eb0ff1
" did not contain "Received decommission executor message" The application
did not complete, did not find str Received decommission executor message.
(KubernetesSuite.scala:387)
Run completed in 12 minutes, 39 seconds.
Total number of tests run: 18
Suites: completed 2, aborted 0
Tests: succeeded 17, failed 1, canceled 0, ignored 0, pending 0
*** 1 TEST FAILED ***
[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 3.933
s]
[INFO] Spark Project Tags ................................. SUCCESS [ 8.364
s]
[INFO] Spark Project Local DB ............................. SUCCESS [ 3.887
s]
[INFO] Spark Project Networking ........................... SUCCESS [ 5.530
s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 3.191
s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 10.257
s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 3.477
s]
[INFO] Spark Project Core ................................. SUCCESS [02:15
min]
[INFO] Spark Project Kubernetes Integration Tests ......... FAILURE [15:20
min]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time: 18:15 min
[INFO] Finished at: 2020-10-07T04:39:53-07:00
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.scalatest:scalatest-maven-plugin:2.0.0:test (integration-test) on project
spark-kubernetes-integration-tests_2.12: There are test failures -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please
read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR] mvn <args> -rf :spark-kubernetes-integration-tests_2.12
+ retcode3=1
+ kill -9 62029
+ minikube stop
: Stopping "minikube" in kvm2 ...
- "minikube" stopped.
/tmp/hudson7622963494261370199.sh: line 66: 62029 Killed
minikube mount ${PVC_TESTS_HOST_PATH}:${PVC_TESTS_VM_PATH}
--9p-version=9p2000.L --gid=0 --uid=185
+ [[ 1 = 0 ]]
+ test_status=failure
+ /home/jenkins/bin/post_github_pr_comment.py
Attempting to post to Github...
> Post successful.
+ rm -rf /tmp/tmp.PP0E0J0Req
+ exit 1
Build step 'Execute shell' marked build as failure
Archiving artifacts
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34109/
Test FAILed.
Finished: FAILURE
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]