[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

vanzin Thu, 15 Dec 2016 11:06:36 -0800

Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16142
  
    I'm not such a big fan of this feature, but mostly I'm not a big fan of the 
current implementation.
    
    For the feature, it feels like it's trying to make the SHS more like a "log 
management system" than a history server. There were already people who were 
concerned about this when the existing cleaner functionality was added. But 
I'll entertain the thought, even though you can probably get pretty close to 
this by using time-based deletion with a shorter max age, coupled with log 
compression (both features that already exist).
    
    For the implementation, you cannot delete things just based on size. You 
need to account for time too; you have to delete older logs first, otherwise 
you risk deleting the logs for just finished applications instead of a large 
log that's been sitting there for months. It's also a reactive change; you're 
already using more space then you want too, the history server will just bring 
that down, eventually.
    
    You change will also bombard the NameNode with requests on every scan, to 
get the size of each log.
    
    At this point I'm not so convinced of the usefulness of the feature, and 
implementing it correctly will be a larger change than you have here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16142: [SPARK-18716][CORE] Restrict the disk usage of spark eve...

Reply via email to