GitHub user zsxwing opened a pull request:

    [SPARK-15698][SQL][STREAMING] Add the ability to remove the old MetadataLog 
in FileStreamSource (branch-2.0)

    ## What changes were proposed in this pull request?
    Backport #13513 to branch 2.0.
    ## How was this patch tested?

You can merge this pull request into a Git repository by running:

    $ git pull SPARK-15698-spark-2.0

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15163
commit 346def7c3895c1fe21dc7b51bdcc8dd06ed61fac
Author: jerryshao <>
Date:   2016-09-20T17:24:12Z

    [SPARK-15698][SQL][STREAMING] Add the ability to remove the old MetadataLog 
in FileStreamSource
    Current `metadataLog` in `FileStreamSource` will add a checkpoint file in 
each batch but do not have the ability to remove/compact, which will lead to 
large number of small files when running for a long time. So here propose to 
compact the old logs into one file. This method is quite similar to 
`FileStreamSinkLog` but simpler.
    Unit test added.
    Author: jerryshao <>
    Closes #13513 from jerryshao/SPARK-15698.
    (cherry picked from commit a6aade0042d9c065669f46d2dac40ec6ce361e63)
    Signed-off-by: Shixiong Zhu <>


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to