[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...

andrewor14 Wed, 03 Sep 2014 16:46:12 -0700

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1618#issuecomment-54385505
  
    @YanTangZhai Thanks for your PR, but I'm still not sure if we want this 
feature. If the data has a lot of skew then the user can look at the logs to 
figure that out, since we log how many times we have spilled so far. Some 
applications do actually spill many times. Imagine if you have a huge dataset 
and somewhat beefy nodes. Every time you spill you're actually doing real work, 
but simply by virtue of the large scale of the data your application might die 
if you spill too many times. This is an unexpected behavior for these 
applications, and we may kill them after many hours of execution on false 
alarm. Do you see my point?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-2715] ExternalAppendOnlyMap adds max li...

Reply via email to