Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2471#issuecomment-58885115
  
    > @mattf I understand what you're trying to say, but think about it in 
context. As I said above, the "when to poll the file system" code is the most 
trivial part of this change. The only advantage of using cron for that is that 
you'd have more scheduling options - e.g., absolute times instead of a period.
    > 
    > To achieve that, you'd be considerably complicating everything else. 
You'd be creating a new command line tool in Spark, that needs to deal with 
command line arguments, be documented, and handle security settings (e.g. 
kerberos) - so it's more burden for everybody, maintaners of the code and 
admins alike.
    > 
    > And all that for a trivial, and I'd say, not really needed gain in 
functionality.
    
    @aw-altiscale pointed me to camus which has a nearly separable component: 
https://github.com/linkedin/camus/tree/master/camus-sweeper
    
    my objection to this is about the architecture and responsibilities of the 
spark components. i don't object to having the functionality.
    
    i think you should implement the ability to sweep/rotate/clean log files in 
hdfs, but not as part of a spark process.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to