[ 
https://issues.apache.org/jira/browse/SPARK-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514303#comment-15514303
 ] 

Josh Rosen commented on SPARK-16830:
------------------------------------

Do you have stacktraces from the failed block fetches? I'd like to see whether 
this may be fixed by a recent patch of mine which helps to avoid failures if 
all locations of non-shuffle blocks are lost / unavailable.

> Executors Keep Trying to Fetch Blocks from a Bad Host
> -----------------------------------------------------
>
>                 Key: SPARK-16830
>                 URL: https://issues.apache.org/jira/browse/SPARK-16830
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Streaming
>    Affects Versions: 1.6.2
>         Environment: EMR 4.7.2
>            Reporter: Renxia Wang
>
> When a host became unreachable, driver removes the executors and block 
> managers on that hosts because it doesn't receive heartbeats. However, 
> executors on other hosts still keep trying to fetch blocks from the bad 
> hosts. 
> I am running a Spark Streaming job to consume data from Kinesis. As a result 
> of this block fetch retrying and failing, I started seeing 
> ProvisionedThroughputExceededException on shards, AmazonHttpClient (to 
> Kinesis) SocketException, Kinesis ExpiredIteratorException etc. 
> This issue also expose a potential memory leak. Starting from the time that 
> the bad host became unreachable, the physical memory usages of executors that 
> keep trying to fetch block from the bad host started increasing and finally 
> hit the physical memory limit and killed by YARN. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to