GitHub user cnZach opened a pull request:

    https://github.com/apache/spark/pull/16293

    [SPARK-17119][Core]allow the history server to delete .inprogress 
files(configurable)

    ## What changes were proposed in this pull request?
    The History Server (HS) currently only considers completed applications 
when deleting event logs from spark.history.fs.logDirectory (since SPARK-6879). 
This means that over time, .inprogress files (from failed jobs, jobs where the 
SparkContext is not closed, spark-shell exits etc...) can accumulate and impact 
the HS.
    
    Instead of having to manually delete these files,  this change add a 
configurable feature to let user decide if the .inprogress files should also be 
deleted after a period of time:
    spark.history.fs.cleaner.deleteInProgress.enabled
    spark.history.fs.cleaner.noProgressMaxAge
    
    ## How was this patch tested?
    
    verified with manual tests
    unit tests added in FsHistoryProviderSuite.scala but I am not able to run 
./dev/run-tests for the whole project on my laptop, failed on SparkSinkSuite 
and network related tests uner org.apache.spark.network.* (all due to  
java.io.IOException: Failed to connect to /<my_laptop_ip>:62343).
    <code>
    [info] SparkSinkSuite:
    [info] - Success with ack *** FAILED *** (1 minute)
    [info]   java.io.IOException: Error connecting to /0.0.0.0:62298
    [info]   at 
org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)
    </code>
    
    ## doc ##
    monitoring.md is also updated

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cnZach/spark SPARK-17119

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16293.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16293
    
----
commit aa45caa42a7bc1b4a06e6634f9a40c4db6b83a89
Author: Yuexin Zhang <[email protected]>
Date:   2016-12-15T06:19:11Z

    allow the history server to delete .inprogress files and make it 
configurable

commit f281d92a49e54f64f157f8d2936a13a73c7284cb
Author: Yuexin Zhang <[email protected]>
Date:   2016-12-15T06:39:12Z

    fix a typo noProgressMaxAg -> noProgressMaxAge

commit 989422d310a0addeb25217e61fda85c34e5d4c89
Author: Yuexin Zhang <[email protected]>
Date:   2016-12-15T06:41:57Z

    fix checkstyle failures in FsHistoryProviderSuite.scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to