[GitHub] spark pull request: [SPARK-3796] Create external service which can...

aarondav Sat, 01 Nov 2014 00:05:38 -0700

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/3001#issuecomment-61360218
  
    @rxin Please take a look at my last commit, fd3928b. This is some critical 
code which handles a couple cases I describe below.
    
    I have completed performance and correctness testing on a real cluster. 
Performance-wise, I saw no regression from the in-executor version. 
Additionally, I saw minimal memory usage from the Worker, where I put the 
server -- I ran several medium-sized shuffles using Workers with 512MB max heap 
sizes (the default) without noticeable garbage collection or heap growth.
    
    During testing, I noticed that we were dropping map outputs if I killed an 
executor in the middle of a map or reduce phase (in local testing, my queries 
ran so quickly that I always killed the executor after the completion of the 
job). This caused us to unnecessarily recompute the map tasks. I have added 
code which only drops map outputs if (1) external shuffle is disabled or (2) 
we're responding to a fetch failure specifically.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3796] Create external service which can...

Reply via email to