Github user mattf commented on the pull request:
https://github.com/apache/spark/pull/2471#issuecomment-57629838
> @mattf don't know what you mean by "functionality that is already
provided by the system". I'm not aware of HDFS having any way to automatically
do housekeeping of old files.
a system approach means using something like logrotate or a cleaner process
that's run from cron.
such an approach is beneficial in a number of ways, including reducing the
complexity of spark by not duplicating functionality that's already available
in spark's environment - akin to using a standard library for i/o instead of
interacting w/ devices directly. in this case the context for the environment
is the system, where you'll find things like logrotate and cron readily
available.
as for rotating logs in hdfs - i wouldn't expect hdfs to provide such a
feature, because the feature serves a specific use case on top of hdfs. some
searching suggests that there are a few solutions available for doing rotation
or pruning of files in hdfs and points out that
rotating/pruning/cleaning/purging can be done remotely and independently from
spark since hdfs is distributed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]