I did some tests on a three node Dataproc cluster with autoscaling on. One
master node and 2 work nodes. the master node was called ctpcluster-m and
the worker nodes were ctpcluster-w-0 and ctpcluster-w-1 respectively
I started a spark-submit job with the following autoscaling parameters added
spark-submit --verbose \
--deploy-mode client \
--conf "spark.yarn.appMasterEnv.SPARK_HOME=$SPARK_HOME" \
--conf "spark.yarn.appMasterEnv.PYTHONPATH=${PYTHONPATH}" \
--conf "spark.executorEnv.PYTHONPATH=${PYTHONPATH}" \
--py-files $CODE_DIRECTORY_CLOUD/spark_on_gke.zip \
--conf "spark.driver.memory"=4G \
--conf "spark.executor.memory"=4G \
--conf "spark.num.executors"=4 \
--conf "spark.executor.cores"=2 \
--conf spark.dynamicAllocation.enabled="true" \
--conf spark.shuffle.service.enabled="true" \
--conf spark.dynamicAllocation.minExecutors=2 \
--conf spark.dynamicAllocation.maxExecutors=10 \
--conf spark.dynamicAllocation.initialExecutors=4 \ same
as spark.num.executors
4
$CODE_DIRECTORY_CLOUD/${APPLICATION}
Once I started the submit job, I shutdown ctpcluster-w-1 node immediately.
These were the diagnostics thrown out:
22/02/05 18:37:12 INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted
application application_1644085520369_0003
22/02/05 18:37:13 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to
ResourceManager at ctpcluster-m/10.154.15.193:8030
Started at
05/02/2022 18:37:18.18
22/02/05 18:37:26 WARN
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint:
*Requesting driver to remove executor 3 for reason Container marked as
failed*: *container_1644085520369_0003_01_03* on host:
ctpcluster-w-1.europe-west2-c.c.xxx.internal. Exit status: -100.
Diagnostics: Container released on a *lost* node.
22/02/05 18:37:26 ERROR org.apache.spark.scheduler.cluster.YarnScheduler: *Lost
executor 3 on ctpcluster-w-1.europe-west2-c.c.xxx.internal: Container
marked as failed:* container_1644085520369_0003_01_03 on host:
ctpcluster-w-1.europe-west2-c.c.xxx.internal. Exit status: -100.
Diagnostics: Container released on a *lost* node.
22/02/05 18:37:26 ERROR org.apache.spark.scheduler.cluster.YarnScheduler: *Lost
executor 1 on ctpcluster-w-1.europe-west2-c.c.xxx.internal: Container
marked as failed:* *container_1644085520369_0003_01_01 *on host:
ctpcluster-w-1.europe-west2-c.c.xxx.internal. Exit status: -100.
Diagnostics: Container released on a *lost* node.
22/02/05 18:37:26 WARN
org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint:
Requesting driver to remove executor 1 for reason Container marked as
failed: container_1644085520369_0003_01_01 on host:
ctpcluster-w-1.europe-west2-c.c.xxx.internal. Exit status: -100.
Diagnostics: Container released on a *lost* node.
So basically two containers out of the original four containers were lost
as they were on the lost node. There was no attempt to autoscale the lost
worker node. The job was executed on the remaining two containers on
ctpcluster-w-0.
My conclusion is that autoscaling is only applied to workload at a clean
state.
HTH
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Sat, 5 Feb 2022 at 09:03, Mich Talebzadeh
wrote:
>
> This question arises when Spark is offered as a managed service on a
> cluster of VMs in Cloud. For example, Google Dataproc
> <https://cloud.google.com/dataproc> or Amazon EMR
> <https://aws.amazon.com/emr/> among others
>
> From what I can see in autoscaling setup, you will always need a minimum
> of two worker nodes as primary. It also states and I quote "Scaling
> primary workers is not recommended due to HDFS limitations which result in
> instability while scaling. These limitations do not exist for secondary
> workers''. So the scaling comes with the secondary workers specifying the
> minimum and maximum instances. It also defaults to 2 minutes for the
> so-called auto scaling cooldown duration presumably to bring new executors
> online. My assumption is that task allocation to the new executors is FIFO
> for new tasks. This link
> <https://docs.qubole.com/en/latest/admin-guide/engine-admin/spark-admin/autoscale-spark.html#:~:text=dynamic%20allocation%20configurations.-,Autoscaling%20in%20Spark%20Clusters,scales%20down%20towards%20the%20minimum.=