It's actually on AWS EMR. The job bootstraps and runs fine -- the
autoscaling group is to bring up a service that spark will be calling. Some
code waits for the autoscaling group to come up before continuing
processing in Spark, since the Spark cluster will need to make requests to
the service in the autoscaling group. It takes several minutes for the
service to come up, and during the wait, Spark starts to show these thread
dumps, as presumably it thinks something is wrong since the executor is
busy waiting and not doing anything. The previous version of Spark did not
do this (2.4.4).

On Thu, Feb 3, 2022 at 6:59 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Sounds like you are running this on Google Dataproc cluster (spark 3.1.2)
> with auto scaling policy?
>
>  Can you describe if this happens before Spark starts a new job on the
> cluster or somehow half way through processing an existing job?
>
> Also is the job involved doing Spark Structured Streaming?
>
> HTH
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Thu, 3 Feb 2022 at 21:29, Maksim Grinman <m...@resolute.ai> wrote:
>
>> We've got a spark task that, after some processing, starts an autoscaling
>> group and waits for it to be up before continuing processing. While waiting
>> for the autoscaling group, spark starts throwing full thread dumps,
>> presumably at the spark.executor.heartbeat interval. Is there a way to
>> prevent the thread dumps?
>>
>> --
>> Maksim Grinman
>> VP Engineering
>> Resolute AI
>>
>

-- 
Maksim Grinman
VP Engineering
Resolute AI

Reply via email to