[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Reynold, I know very much about the time of reviewers, I put 1+h a day on the hadoop codebase reviewing stuff, generally trying to review the work of non-colleagues, so as to pull in the broad set of contributions which are needed.. I have been trying to get some object store related patches into spark alongside the foundational work in fundamentally transforming how we work with object storage, especially S3, in Hadoop. Without the spark side changes, a lot gets lost: here the performance is approx 100-300mS/file when scanning an object store. here I've split things in two, docs and diff. Both are independent, both are reasonably tractable. If they can be reviewed fast and added, there's no problems of patches ageing, everyone having to resync. We can get this out the way, and you've have fewer reasons to be unhappy with me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14731 Steve I think the main point is you should also respect the time of reviewers. The way most of your pull requests manifest have been suboptimal: they often start with a very early WIP (which is not necessarily a problem), and once in a while (e.g. a month or two) you update it to almost completely change it. The time itself is a problem. It requires a lot of context switching to review your pull requests. In addition, every time you update it it looks like a complete new giant pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Ok. what is the way? Do I write a formal proposal? Because right now there is no reliable way to get the full dependency graph of Spark + hadoop cloud JARs + direct cloud provider JARs (azure,aws) and their dependencies (jackson) in sync. Which means that getting Spark to talk to object stores is more miss than hit. I'm happy to follow the proposal mechanism, including progress reports , but I do at least need some kind of hope that my work will actually get in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Is there anything else I need to do here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74990/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #74990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)** for PR 14731 at commit [`a3aaf26`](https://github.com/apache/spark/commit/a3aaf267d2ac30c012b4a71b7a80e28a49ff10be). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #74990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74990/testReport)** for PR 14731 at commit [`a3aaf26`](https://github.com/apache/spark/commit/a3aaf267d2ac30c012b4a71b7a80e28a49ff10be). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The Hadoop FS Spec has now been updated to declare exactly what HDFS does w.r.t timestamps, and warn that what other filesystems and object stores do are implementation and installation specific features: [filesystem.md](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md) That is the associated documentation update with this one; some of the content there was originally here, but moved over to the hadoop docs for the HDFS team to take the blame for when it changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/14731 @srowen Waiting for your final OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73434/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73434 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73434/testReport)** for PR 14731 at commit [`724495b`](https://github.com/apache/spark/commit/724495b97c1521ae5bd4c284d911c5ae6f51b19c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73433/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73433/testReport)** for PR 14731 at commit [`04f4967`](https://github.com/apache/spark/commit/04f49679b3f4f3e2d99e7cafeb9e4fa91fe98ece). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73434/testReport)** for PR 14731 at commit [`724495b`](https://github.com/apache/spark/commit/724495b97c1521ae5bd4c284d911c5ae6f51b19c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #73433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73433/testReport)** for PR 14731 at commit [`04f4967`](https://github.com/apache/spark/commit/04f49679b3f4f3e2d99e7cafeb9e4fa91fe98ece). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen: reviewed this, tweaked the docs slightly but otherwise, there's nothing left to do that I can see --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71866/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #71866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71866/testReport)** for PR 14731 at commit [`06b2bee`](https://github.com/apache/spark/commit/06b2beec75084db1ee330fa4ff4d50775d9f540c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @uncleGen I've updated it. Note that [HADOOP-13946](https://issues.apache.org/jira/browse/HADOOP-13946) tracks the changes in the Hadoop docs, which writes down what HDFS actually does, then note how cloud object stores have no consistent behaviour w.r.t. timestamps.While I personally believe that direct PUT calls is the way to write data, there's still ambiguity then as to when the objects get a timestamp (S3 : when the PUT/multipart put is first initiated, and not updated on the close() if the put was started earlier) âso when they become visible. So: I don't go into the details, just say "look at the docs, then test on your system". That's about as authoritative as you can get --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #71866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71866/testReport)** for PR 14731 at commit [`06b2bee`](https://github.com/apache/spark/commit/06b2beec75084db1ee330fa4ff4d50775d9f540c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 let me do a quick review & update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/14731 @steveloughran Are you still working on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Sean, I think I've managed to delete the lines where you were asking about globs > Am I right that the net change here is not an optimization but an expansion of the behavior to support globs rather than single dirs? There's no changes in this source to change the expansion policy; that went in with [SPARK-14976](https://issues.apache.org/jira/browse/SPARK-14976), "make StreamingContext.textFileStream support wildcard". This updates the docs to cover what goes on (the wildcard covering directories, but not the files inside them), and makes the scan much more efficient on object stores. No changes in the semantics of what gets found or when things get found --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #70819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70819/testReport)** for PR 14731 at commit [`a9a6f7b`](https://github.com/apache/spark/commit/a9a6f7b9e3876e551a2568b6220559992db40228). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #70819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70819/testReport)** for PR 14731 at commit [`a9a6f7b`](https://github.com/apache/spark/commit/a9a6f7b9e3876e551a2568b6220559992db40228). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 @srowen have you got any comments on the last patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66656/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #66656 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66656/consoleFull)** for PR 14731 at commit [`f8ed8a3`](https://github.com/apache/spark/commit/f8ed8a3551d1eed5db5a22f5eeb484614036fefe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #66656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66656/consoleFull)** for PR 14731 at commit [`f8ed8a3`](https://github.com/apache/spark/commit/f8ed8a3551d1eed5db5a22f5eeb484614036fefe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65592/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)** for PR 14731 at commit [`57f697d`](https://github.com/apache/spark/commit/57f697dc718e536f512c856b8e6c8239e1133fd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)** for PR 14731 at commit [`57f697d`](https://github.com/apache/spark/commit/57f697dc718e536f512c856b8e6c8239e1133fd5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65498/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65498/consoleFull)** for PR 14731 at commit [`735fc7c`](https://github.com/apache/spark/commit/735fc7c2343c08a323e3d213e611830e3b41ef04). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65498/consoleFull)** for PR 14731 at commit [`735fc7c`](https://github.com/apache/spark/commit/735fc7c2343c08a323e3d213e611830e3b41ef04). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The latest patch pulls out the shortcutting of the globStatus call if there's no wildcard chars in the path; closer to the original patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64662/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64662/consoleFull)** for PR 14731 at commit [`b60f175`](https://github.com/apache/spark/commit/b60f175b5ef058ed24b3ddaf9a85b899a5e33187). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64662/consoleFull)** for PR 14731 at commit [`b60f175`](https://github.com/apache/spark/commit/b60f175b5ef058ed24b3ddaf9a85b899a5e33187). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64534/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64534/consoleFull)** for PR 14731 at commit [`4134620`](https://github.com/apache/spark/commit/4134620210e28a2e182397a9bc94ccb8c4d5ffc4). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64534/consoleFull)** for PR 14731 at commit [`4134620`](https://github.com/apache/spark/commit/4134620210e28a2e182397a9bc94ccb8c4d5ffc4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64488/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64488/consoleFull)** for PR 14731 at commit [`fe40bd2`](https://github.com/apache/spark/commit/fe40bd2bf548ca973f9dcdf9426fb9834828f72b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64488/consoleFull)** for PR 14731 at commit [`fe40bd2`](https://github.com/apache/spark/commit/fe40bd2bf548ca973f9dcdf9426fb9834828f72b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64486/consoleFull)** for PR 14731 at commit [`9bc0ea9`](https://github.com/apache/spark/commit/9bc0ea9734ccaf11c6306a3496c98be8cc20faab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64486 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64486/consoleFull)** for PR 14731 at commit [`9bc0ea9`](https://github.com/apache/spark/commit/9bc0ea9734ccaf11c6306a3496c98be8cc20faab). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64486/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64368/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64368/consoleFull)** for PR 14731 at commit [`b63abfe`](https://github.com/apache/spark/commit/b63abfe32a5509f69c7f725a46b2e6ac8fb9cf1f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Having looked at the source code, `FileSystem.globStatus()` uses the glob patterns, which are not the same as the posix regexp ones. [org.apache.hadoop.fs.GlobPattern](http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.1/org/apache/hadoop/fs/GlobPattern.java#81) does the conversion. For the docs, I'll just use a wildcard * in the example, rather than try anything more sophisticated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64368/consoleFull)** for PR 14731 at commit [`b63abfe`](https://github.com/apache/spark/commit/b63abfe32a5509f69c7f725a46b2e6ac8fb9cf1f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 The logic has got complex enough it merits unit tests. Pulling into SparkHadoopUtils itself and writing some for the possible: simple, glob matches one , glob matches 1+, glob doesn't match, file not found --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64296/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64296/consoleFull)** for PR 14731 at commit [`79b57a2`](https://github.com/apache/spark/commit/79b57a2683dece86e1acd063b2d33fa5a6dd6038). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 1. updated the code to bypass the glob routine when there is no wildcard; this bypasses something fairly inefficient. 1. reporting FNFE on that base dir differently; skip the stack trace (maybe: log at a lower level?). 1. Updated the docs with a special list of blobstore best practises. It's a bit hard to get some of that phrasing of what the wildcard does right; needs careful review. Tested using my s3 streaming test, which did use a * in the wildcard. All works, but no improvements in speed on what is a fairly unrealistic structure. The time to recursively list object stores remotely is tangibly slow. Maybe that should go in the text too: "it can be take seconds to scan object stores for new data, with the time being proportional to directory depth and the number of files in a directory. Shallow and wide directory trees are faster" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64296/consoleFull)** for PR 14731 at commit [`79b57a2`](https://github.com/apache/spark/commit/79b57a2683dece86e1acd063b2d33fa5a6dd6038). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 I've now done the [s3a streaming test/example](https://github.com/steveloughran/spark/blob/features/SPARK-7481-cloud/cloud/src/main/scala/org/apache/spark/cloud/s3/examples/S3Streaming.scala) this uses a pattern of s3a/path/sub* as the directory path; then creates a file in a directory and renames the dir to match the path; verifies that the file was found in the time period allocated https://gist.github.com/steveloughran/c8b39a7b87a9bd63d7a383bda8687e7e Notable that the scan of the empty dir took 150ms; once there's data in the tree the time jumps up to 500ms once there are two entries under the tree, one dir and one file summary stats show 72 getFileStatus calls at the FS API, mapping to 140 HEAD calls and 88 LIST operations. ``` S3AFileSystem{uri=s3a://stevel-ireland-new, workingDir=s3a://hwdev-steve-ireland-new/user/stevel, inputPolicy=sequential, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=1048576, multiPartThreshold=2147483647, statistics {292 bytes read, 292 bytes written, 101 read ops, 0 large read ops, 11 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=343b706a-c238-4d71-9ed8-8083601ac28a-hwdev-steve-ireland-new} {fsURI=s3a://hwdev-steve-ireland-new} {files_created=1} {files_copied=1} {files_copied_bytes=292} {files_deleted=1} {directories_created=3} {directories_deleted=0} {ignored_errors=2} {op_copy_from_local_file=0} {op_exists=1} {op_get_file_status=72} {op_glob_status=16} {op_is_directory=0} {op_is_file=0} {op_list_files=0} {op_list_located_status=0} {op_list_status=27} {op_mkdirs=2} {op_rename=1} {object_copy_requests=0} {object_delete_requests=3} {object_list_requests=88} {object_continue_list_requests=0} {object_metadata_requests=1 40} {object_multipart_aborted=0} {object_put_bytes=292} {object_put_requests=4} {stream_read_fully_operations=0} {stream_bytes_skipped_on_seek=0} {stream_bytes_backwards_on_seek=0} {stream_bytes_read=292} {streamOpened=1} {stream_backward_seek_pperations=0} {stream_read_operations_incomplete=0} {stream_bytes_discarded_in_abort=0} {stream_close_operations=1} {stream_read_operations=1} {stream_aborted=0} {stream_forward_seek_operations=0} {streamClosed=1} {stream_seek_operations=0} {stream_bytes_read_in_close=0} {stream_read_exceptions=0} }} ``` I'm going to do a test run with the modification here and see what it does to listing and status --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 Actually, I've just noticed that DStream behaviour isn't in sync with the streaming programming guide, which says "files written in nested directories not supported)". That is: SPARK-14796 didn't patch the docs. it may as well be fixed in this patch. How about, in the bullet points underneath - Wildcards may be used to specify a set of directories to scan for new files, for example `hdfs://nn1:8050/users/alice/logs/2016-*/*.gz` - -New directories and their contents will be discovered as they arrive Special points for object stores - Wildcard lookup may be very slow with some object stores. - Directory rename is not atomic; if a directory is renamed into the streaming source, then the files within may only be discovered and process across a multiple streaming windows. - + there's another optimisation; use the {{SparkHadoopUtils.isGlobPath()}} predicate to recognise when the dir path isn't a wildcard, in which case just do a simple listFiles()}}. Until that shortcutting is done automatically in the Hadoop FS implementation, spark can do it on its side. As the {{listFiles()}} call was what was used before SPARK-14796, it has to be compatible, else SPARK-14796 has broken things --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 LGTM. I was trying to see if there was a way to create a good test here by triggering the takes-too-long codepath and having a counter, but there's no obvious way to do that deterministically. I am doing a test for this against s3 in the spark-cloud module I'm writing; I can look at the printed counts of getFileStatus before/after the patch to see the difference, but the actual (testable) metrics are only accessible with forthcoming Hadoop 2.8 release. TL;DR: no easy test, so there's nothing left to do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 This is ready to go right @steveloughran ? LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)** for PR 14731 at commit [`b08e3c9`](https://github.com/apache/spark/commit/b08e3c9937a63a08b274a1491ea7064168646f1d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64156/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64156/consoleFull)** for PR 14731 at commit [`b08e3c9`](https://github.com/apache/spark/commit/b08e3c9937a63a08b274a1491ea7064168646f1d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64142/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64142/consoleFull)** for PR 14731 at commit [`6e8ace0`](https://github.com/apache/spark/commit/6e8ace0444ec9bdebc7c809a08628891f6de5fd0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 Ah right, you already have the modification time for free. Sounds good, remove the caching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64142/consoleFull)** for PR 14731 at commit [`6e8ace0`](https://github.com/apache/spark/commit/6e8ace0444ec9bdebc7c809a08628891f6de5fd0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 to be precise: the caching of file modification times is superfluous. It's there to avoid the cost of executing `getFileStatus()` on previously scanned files. Once you use the FileStatus returned in a listing, you aren't calling `getFileStatus()`, hence: no need to cache --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 Why is the caching superfluous -- because no file is evaluated more than once here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/14731 # I'm going to scan through and tune them elsewhere; really I'm going by uses of the listFiles calls There's actually no significant use elsewhere that I can see; just a couple of uses which filter on filename âso there is no cost penalty. * `SparkHadoopUtil.listLeafStatuses()` does implement its own directory recursion to find files; FileSystem.listFiles(path, true) does that, and on S3A will do flat scan that is O(files/5000); no directory overhead at all. * Otherwise, globStatus() can be pretty slow against object stores, but the fix there isn't in the client code; it means someone needs to implement [HADOOP-13371](https://issues.apache.org/jira/browse/HADOOP-13371), *S3A globber to use bulk listObject call over recursive directory scan* âmore specifically, an implementation scalable to production datasets. Returning to this patch, should I cut out the caching? I think it is superfluous. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14731 LGTM. Does this sort of change make sense elsewhere where `PathFilter` is used? I glanced at the others and it looked like a wash in other cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64140/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64140/consoleFull)** for PR 14731 at commit [`738c51b`](https://github.com/apache/spark/commit/738c51bb57f331c58a877aa20aa5e2beb1084114). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #64140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64140/consoleFull)** for PR 14731 at commit [`738c51b`](https://github.com/apache/spark/commit/738c51bb57f331c58a877aa20aa5e2beb1084114). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org