Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-58885115 > @mattf I understand what you're trying to say, but think about it in context. As I said above, the "when to poll the file system" code is the most trivial part of this change. The only advantage of using cron for that is that you'd have more scheduling options - e.g., absolute times instead of a period. > > To achieve that, you'd be considerably complicating everything else. You'd be creating a new command line tool in Spark, that needs to deal with command line arguments, be documented, and handle security settings (e.g. kerberos) - so it's more burden for everybody, maintaners of the code and admins alike. > > And all that for a trivial, and I'd say, not really needed gain in functionality. @aw-altiscale pointed me to camus which has a nearly separable component: https://github.com/linkedin/camus/tree/master/camus-sweeper my objection to this is about the architecture and responsibilities of the spark components. i don't object to having the functionality. i think you should implement the ability to sweep/rotate/clean log files in hdfs, but not as part of a spark process.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org