[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99862/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99862/testReport)** for PR 22952 at commit [`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class FileStreamSourceCleaner(fileSystem: FileSystem, sourcePath: Path,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99862/testReport)** for PR 22952 at commit [`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99855/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99855/testReport)** for PR 22952 at commit [`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class FileStreamSourceCleaner(fileSystem: FileSystem, sourcePath: Path,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99855/testReport)** for PR 22952 at commit [`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99851/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99851 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99851/testReport)** for PR 22952 at commit [`22dce0c`](https://github.com/apache/spark/commit/22dce0c1d17de360c0d887a0352c1e8f57761db7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class FileStreamSourceCleaner(fileSystem: FileSystem, sourcePath: Path,` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @zsxwing Please also take a look: I guess I addressed glob overlap issue as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99851/testReport)** for PR 22952 at commit [`22dce0c`](https://github.com/apache/spark/commit/22dce0c1d17de360c0d887a0352c1e8f57761db7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99833/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99833/testReport)** for PR 22952 at commit [`17b9b9a`](https://github.com/apache/spark/commit/17b9b9a043ead0d448048c88b30f544228bd230b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class FileStreamSourceCleaner(fileSystem: FileSystem, sourceGlobFilters: Seq[GlobFilter])` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi @steveloughran Please take a look at 17b9b9a043ead0d448048c88b30f544228bd230b which just leverages GlobFilter. You may find that when the depth of archive path is more than 2, there's no chance for final destination to be picked up from FileStreamSource: so most of usual cases overlap will not happen, as well as Spark can determine this as only comparing depths. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99833/testReport)** for PR 22952 at commit [`17b9b9a`](https://github.com/apache/spark/commit/17b9b9a043ead0d448048c88b30f544228bd230b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22952 BTW: [HADOOP-15748](https://issues.apache.org/jira/browse/HADOOP-15748), *S3 listing inconsistency can raise NPE in globber* Could be backported to 2.8+; low risk --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22952 > Looks like we can leverage GlobPattern but it is marked as @Private. I will happily switch this from private to public/evolving if you submit a patch against hadoop-trunk; backport it. Most recent changes to that class were 2015 (!) HADOOP-12436 and jan 2017, HADOOP-13976, newline handling. Nobody is going to modify that class out of fear of breaking things. Much easier for me to review and commit than to write a patch myself and try to get it reviewed... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 I'm now also playing with Hadoop glob relevant classes to check whether final destination matches source path glob pattern or not. * Looks like we can leverage `GlobPattern` but it is marked as `@Private`. * `GlobFilter` is `@Public` but it only checks against `path.getName()` so it would only compare with the last component. If we would like to leverage this, we should split all components and compare with multiple filters. Will update the code and test once I find a viable approach. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR It was tested with S3 and the trick is to have HUGE amount of files. Listing files is pathologically bad as @steveloughran stated, glob is even worse. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi That's really huge... Could you share how you tested? Like which FS (local/HDFS/S3/etc), directory structure, count of files... That would help me understanding the impact and also help on testing manually when we deal with optimization. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR It's a question what is not big deal, I've seen ~1 hour glob request when huge amount of files stored :) If file move is even worse one more reason to move it to separate thread. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi @steveloughran OK. I'll change the approach to just check against final path for each moving. As @steveloughran stated, it may bring performance hit for each checking when dealing with object stores, so we may also need to provide a way to disable checking as well with caution. (Btw, if moving file in object store requires huge overhead rather than globing, slow globing may not be a big deal. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR @steveloughran As I see not only `*` and `?` missing but `[]` also. * Having glob parser in spark and supporting it I think it's too heavy and brittle. * Considering these I would solve it with warnings + caveat message in the doc (mentioning the slow globbing on object stores). As a separate offtopic just wondering how hadoop's globbing works if expander doesn't support all the glob elements. Maybe other operators (like `[]`) handled in different code part!? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi @steveloughran `GlobExpander` only looks like handling `{}` pattern. We need to still deal with `*` and `?` which can't be expanded like this. It would only work if we would be OK with restricting descendants of multiple paths (for now we restrict descendants of one path), so while it would help fixing the bug of current patch, it might be still too restrictive. I think changing Hadoop version because of this costs too much. If we really would like to go, only viable solution is copying the code. (Actually we can also just reimplement it since its requirements are like a kind of assignment, though we may end up with similar code.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22952 bq. GlobExpander is private that's correctable. 1. Make sure there are standalone tests (if none around) 1. Make sure that off filesystem.md there's something declaring normatively WTF it expands 1. Provide a patch moving from @Private to @Public/Evolving We can apply those changes to branch-2 and trunk, and, because the class is already public you can use it knowing that in the development line of hadoop we've committed to not moving, deleting or breaking it. As usual: file a HADOOP-* jira with the patch, link me to it, I'll do my best to review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR I've taken a look at the possibilities: * [GlobExpander](https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobExpander.java#L63) is private * `globStatus` recursively is not an option because of it's poor performance * `globStatus` with limited scope can be an option but there are cases where it might take some execution time * Print warnings and not moving files is an option which seems feasible --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @zsxwing @gaborgsomogyi What we were trying to do is enforcing archive path so that moved files will not make overlap with source path. There may be same file name with different directory so I'm also trying to persist its own path in final archived path, which means archive files will not be placed in same directory. Based on above, I thought enforcing archive path with checking glob path is not easy to do, because without knowing final archive path (per file) we can't check it matches with glob pattern. That's why I just would rather restrict all subdirectories instead of finding a way to check against glob pattern. Actually I'm a bit afraid that we might be putting too much complexity on enforcing archive path. If we are OK with not enforcing archive path and just verify the final archive path doesn't overlap source path per each source file, it would be simple to do. We can make Spark not moving file and log warning message to let end users specify other directory. Would like to hear everyone's thought and idea. Thanks in advance! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22952 > HDFS does not support it yet, though on the way, see https://issues.apache.org/jira/browse/HADOOP-10019 That's an old patch; I don't know of any active dev there. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22952 Hadoop FS glob filtering is pathologically bad on object stores. I have tried in the past to do an ~O(1) impl for S3 [HADOOP-13371](https://issues.apache.org/jira/browse/HADOOP-13371). While I could produce one which was efficient for test cases, it would suffer in the use case "selective pattern match at the top of a very wide tree", where you really do want to filter down aggressively for the topmost directory/directories. I think there you'd want to have a threshold as to how many path elements up you'd switch from ls dir + match into the full deep listfiles(recursive) scan Not looked at it for ages. If someone does want to play there, welcome to take it up --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR Related the glob part @zsxwing pointed out an important problem. Glob pattern is much more than checking `*` and `?`, see the link up. For simplicity take this test: ``` ... val sourcePath = "/hello/worl{d}" val archivePath = "/hello/world/spark" ... ``` This should throw `IllegalArgumentException` but proceeding without exception. A glob parser would be good to be used. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR I've taken a deeper look at the overlap thing and found the following. * Added an additional test which produced odd result: ``` ... val sourcePath = "/hello/worl" val archivePath = "/hello/world/spark" ... ``` This has thrown `IllegalArgumentException` but the `sourcePath` is different than `archivePath`. This happens without any glob magic. * This approach may not work if there are symlinks involved (`fs.makeQualified` doesn't make any link resolve). * HDFS does not support it yet, though on the way, see https://issues.apache.org/jira/browse/HADOOP-10019 * S3 and ADLS does not support it All in all this part is fine now. Checking the glob part... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99567/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99567/testReport)** for PR 22952 at commit [`79fa3e0`](https://github.com/apache/spark/commit/79fa3e0f8b3fc2a27165c3af02df96b6d4d354cb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99567/testReport)** for PR 22952 at commit [`79fa3e0`](https://github.com/apache/spark/commit/79fa3e0f8b3fc2a27165c3af02df96b6d4d354cb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @zsxwing Yeah, it would be ideal we can enforce `archivePath` to which don't have any possibility to match against source path (glob), so my approach was to find directory which is the base directory without having glob in ancestor, and `archive path + base directory of source path` doesn't belong to sub-directory of found directory. For example, suppose source path is `/a/b/c/*/ef?/*/g/h/*/i`, then base directory of source path would be `/a/b/c`, and `archive path + base directory of source path` should not belong to sub-directory of `/a/b/c`. (My code has a bug for finding the directory so need to fix it.) This is not an elegant approach and the approach has false-positive, ending up restricting the archive path which actually doesn't make overlap (too restrict), but it would guarantee two paths never overlap. (So no need to re-check when renaming file.) I guess the approach might be reasonable because in practice end users would avoid themselves have to think about complicated case on overlaps, and just isolate two paths. What do you think about this approach? cc. @gaborgsomogyi Could you also help validating my approach? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/22952 > @zsxwing Btw, how do you think about addressing background move/deletion (I had thought and Yeah, this can be done in a separate ticket. I was playing with `org.apache.hadoop.fs.GlobFilter` to see how to detect the overlap. But one major issue is before getting the target path, we don't know whether a path will match [the glob pattern](https://self-learning-java-tutorial.blogspot.com/2016/01/hadoop-java-globbing.html) or not. The worst case, we can check the overlap when parsing the options for a normal path. For glob path, we can use `GlobFilter/GlobPattern` to check before doing the rename, in which case we can just use the target path. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99450/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99450/testReport)** for PR 22952 at commit [`f59c35a`](https://github.com/apache/spark/commit/f59c35aad5a0c2b436fe36c3116cce532fa0beee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @zsxwing Btw, how do you think about addressing background move/deletion (I had thought and @gaborgsomogyi also suggested as well) into separate issue? I guess putting more feature would let you spend more time to review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99450/testReport)** for PR 22952 at commit [`f59c35a`](https://github.com/apache/spark/commit/f59c35aad5a0c2b436fe36c3116cce532fa0beee). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @zsxwing Thanks for the detailed review! Addressed review comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR ok, feel free to ping me if review needed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi Thanks for taking care, but I guess I can manage it. I'll ask for help when I can't go back to this one. This patch (latest change) hasn't get any feedback on committers yet so let's not rush on this and wait for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR I'm fine with this, on the other hand if you're focusing on different things I'm happy to create a jira + PR for the separate thread thing to speed up processing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99180/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99180/testReport)** for PR 22952 at commit [`007f5d5`](https://github.com/apache/spark/commit/007f5d53e7a130ae08bc36494a9f04c882e7c711). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #99180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99180/testReport)** for PR 22952 at commit [`007f5d5`](https://github.com/apache/spark/commit/007f5d53e7a130ae08bc36494a9f04c882e7c711). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi Thanks for reviewing! I addressed your review comments except asynchronous cleanup, which might be able to break down to separated issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/22952 @gaborgsomogyi Yeah I also thought about the idea (commented above) but I've lost focus on other task. Given that smaller patch is better to be reviewed easily and current patch works well (except overheads on cleaning in same thread), would we split this up and address it to another issue? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98918/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #98918 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98918/testReport)** for PR 22952 at commit [`33c5681`](https://github.com/apache/spark/commit/33c5681ab022116133576e4e27c50e346c1ffba9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98919/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98917/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #98917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98917/testReport)** for PR 22952 at commit [`3f6b5fb`](https://github.com/apache/spark/commit/3f6b5fbf01b2e78dfc9ecf7e3b45ef771fec74a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22952 **[Test build #98919 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98919/testReport)** for PR 22952 at commit [`ca26b41`](https://github.com/apache/spark/commit/ca26b4136adc09fb9015c973953b50d894fc8779). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org