GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/22339
SPARK-17159 Significant speed up for running spark streaming against Object store. ## What changes were proposed in this pull request? Original work by Steve Loughran. Based on #17745. This is a minimal patch of changes to FileInputDStream to reduce File status requests when querying files. Each call to file status is 3+ http calls to object store. This patch eliminates the need for it, by using FileStatus objects. This is a minor optimisation when working with filesystems, but significant when working with object stores. ## How was this patch tested? Tests included. Existing tests pass. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark PR_17745 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22339.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22339 ---- commit 2fba9af597349fc023e04a845d1cfacfc3ab7d9e Author: Steve Loughran <stevel@...> Date: 2017-04-24T13:04:04Z SPARK-17159 Significant speed up for running spark streaming against Object store. Based on #17745. Original work by Steve Loughran. This is a minimal patch of changes to FileInputDStream to reduce File status requests when querying files. This is a minor optimisation when working with filesystems, but significant when working with object stores. Change-Id: I269d98902f615818941c88de93a124c65453756e ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org