Timothy, why are you writing application logs to HDFS? In case you want to analyze these logs later, you can write to local storage on your slave nodes and later rotate those files to a suitable location. If they are only going to useful for debugging the application, you can always remove them periodically. Thanks, Dev
On Mar 6, 2017 9:48 AM, "Timothy Chan" <tc...@lumoslabs.com> wrote: > I'm running a single worker EMR cluster for a Structured Streaming job. > How do I deal with my application log filling up HDFS? > > /var/log/spark/apps/application_1487823545416_0021_1.inprogress > > is currently 21.8 GB > > *Sent with Shift > <https://tryshift.com/?utm_source=SentWithShift&utm_campaign=Sent%20with%20Shift%20Signature&utm_medium=Email%20Signature&utm_content=General%20Email%20Group/>* >