[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99862/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99862/testReport)**
 for PR 22952 at commit 
[`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class FileStreamSourceCleaner(fileSystem: FileSystem, sourcePath: 
Path,`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99862/testReport)**
 for PR 22952 at commit 
[`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99855/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99855/testReport)**
 for PR 22952 at commit 
[`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class FileStreamSourceCleaner(fileSystem: FileSystem, sourcePath: 
Path,`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99855/testReport)**
 for PR 22952 at commit 
[`998e769`](https://github.com/apache/spark/commit/998e769c3b552c39736af7814f60be895dbd90d4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99851/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99851 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99851/testReport)**
 for PR 22952 at commit 
[`22dce0c`](https://github.com/apache/spark/commit/22dce0c1d17de360c0d887a0352c1e8f57761db7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class FileStreamSourceCleaner(fileSystem: FileSystem, sourcePath: 
Path,`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@zsxwing Please also take a look: I guess I addressed glob overlap issue as 
well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99851 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99851/testReport)**
 for PR 22952 at commit 
[`22dce0c`](https://github.com/apache/spark/commit/22dce0c1d17de360c0d887a0352c1e8f57761db7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99833/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99833 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99833/testReport)**
 for PR 22952 at commit 
[`17b9b9a`](https://github.com/apache/spark/commit/17b9b9a043ead0d448048c88b30f544228bd230b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  class FileStreamSourceCleaner(fileSystem: FileSystem, 
sourceGlobFilters: Seq[GlobFilter])`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@gaborgsomogyi @steveloughran 
Please take a look at 17b9b9a043ead0d448048c88b30f544228bd230b which just 
leverages GlobFilter. You may find that when the depth of archive path is more 
than 2, there's no chance for final destination to be picked up from 
FileStreamSource: so most of usual cases overlap will not happen, as well as 
Spark can determine this as only comparing depths.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99833/testReport)**
 for PR 22952 at commit 
[`17b9b9a`](https://github.com/apache/spark/commit/17b9b9a043ead0d448048c88b30f544228bd230b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22952
  
BTW: [HADOOP-15748](https://issues.apache.org/jira/browse/HADOOP-15748), 
*S3 listing inconsistency can raise NPE in globber*

Could be backported to 2.8+; low risk


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-07 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22952
  
 > Looks like we can leverage GlobPattern but it is marked as @Private.

I will happily switch this from private to public/evolving if you submit a 
patch against hadoop-trunk; backport it. Most recent changes to that class were 
2015 (!) HADOOP-12436 and jan 2017, HADOOP-13976, newline handling. Nobody is 
going to modify that class out of fear of breaking things.

Much easier for me to review and commit than to write a patch myself and 
try to get it reviewed...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-06 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
I'm now also playing with Hadoop glob relevant classes to check whether 
final destination matches source path glob pattern or not.

* Looks like we can leverage `GlobPattern` but it is marked as `@Private`.
* `GlobFilter` is `@Public` but it only checks against `path.getName()` so 
it would only compare with the last component. If we would like to leverage 
this, we should split all components and compare with multiple filters.

Will update the code and test once I find a viable approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR It was tested with S3 and the trick is to have HUGE amount of 
files. Listing files is pathologically bad as @steveloughran stated, glob is 
even worse.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@gaborgsomogyi 
That's really huge... Could you share how you tested? Like which FS 
(local/HDFS/S3/etc), directory structure, count of files... That would help me 
understanding the impact and also help on testing manually when we deal with 
optimization.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-05 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR It's a question what is not big deal, I've seen ~1 hour glob 
request when huge amount of files stored :)
If file move is even worse one more reason to move it to separate thread.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@gaborgsomogyi @steveloughran 
OK. I'll change the approach to just check against final path for each 
moving. As @steveloughran stated, it may bring performance hit for each 
checking when dealing with object stores, so we may also need to provide a way 
to disable checking as well with caution. (Btw, if moving file in object store 
requires huge overhead rather than globing, slow globing may not be a big deal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-05 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR @steveloughran 
As I see not only `*` and `?` missing but `[]` also.

* Having glob parser in spark and supporting it I think it's too heavy and 
brittle.
* Considering these I would solve it with warnings + caveat message in the 
doc (mentioning the slow globbing on object stores).

As a separate offtopic just wondering how hadoop's globbing works if 
expander doesn't support all the glob elements. Maybe other operators (like 
`[]`) handled in different code part!?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-04 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@gaborgsomogyi @steveloughran 
`GlobExpander` only looks like handling `{}` pattern. We need to still deal 
with `*` and `?` which can't be expanded like this. 

It would only work if we would be OK with restricting descendants of 
multiple paths (for now we restrict descendants of one path), so while it would 
help fixing the bug of current patch, it might be still too restrictive.

I think changing Hadoop version because of this costs too much. If we 
really would like to go, only viable solution is copying the code. (Actually we 
can also just reimplement it since its requirements are like a kind of 
assignment, though we may end up with similar code.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-04 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22952
  
bq. GlobExpander is private

that's correctable. 

1. Make sure there are standalone tests (if none around)
1. Make sure that off filesystem.md there's something declaring normatively 
WTF it expands
1. Provide a patch moving from @Private  to @Public/Evolving

We can apply those changes to branch-2 and trunk, and, because the class is 
already public you can use it knowing that in the development line of hadoop 
we've committed to not moving, deleting or breaking it.

As usual: file a HADOOP-* jira with the patch, link me to it, I'll do my 
best to review


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-04 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR 
I've taken a look at the possibilities:
* 
[GlobExpander](https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobExpander.java#L63)
 is private
* `globStatus` recursively is not an option because of it's poor performance
* `globStatus` with limited scope can be an option but there are cases 
where it might take some execution time
* Print warnings and not moving files is an option which seems feasible



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-03 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@zsxwing @gaborgsomogyi 
What we were trying to do is enforcing archive path so that moved files 
will not make overlap with source path. There may be same file name with 
different directory so I'm also trying to persist its own path in final 
archived path, which means archive files will not be placed in same directory.

Based on above, I thought enforcing archive path with checking glob path is 
not easy to do, because without knowing final archive path (per file) we can't 
check it matches with glob pattern. That's why I just would rather restrict all 
subdirectories instead of finding a way to check against glob pattern.

Actually I'm a bit afraid that we might be putting too much complexity on 
enforcing archive path. If we are OK with not enforcing archive path and just 
verify the final archive path doesn't overlap source path per each source file, 
it would be simple to do. We can make Spark not moving file and log warning 
message to let end users specify other directory.

Would like to hear everyone's thought and idea. Thanks in advance!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-03 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22952
  
> HDFS does not support it yet, though on the way, see 
https://issues.apache.org/jira/browse/HADOOP-10019

That's an old patch; I don't know of any active dev there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-03 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/22952
  
Hadoop FS glob filtering is pathologically bad  on object stores. 
I have tried in the past to do an ~O(1) impl for S3 
[HADOOP-13371](https://issues.apache.org/jira/browse/HADOOP-13371). While I 
could produce one which was efficient for test cases, it would suffer in the 
use case "selective pattern match at the top of a very wide tree", where you 
really do want to filter down aggressively for the topmost 
directory/directories.

I think there you'd want to have a threshold as to how many path elements 
up you'd switch from ls dir + match into the full deep listfiles(recursive) 
scan 

Not looked at it for ages. If someone does want to play there, welcome to 
take it up


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-03 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR 
Related the glob part @zsxwing pointed out an important problem. Glob 
pattern is much more than checking `*` and `?`, see the link up. For simplicity 
take this test:
```
...
  val sourcePath = "/hello/worl{d}"
  val archivePath = "/hello/world/spark"
...
```
This should throw `IllegalArgumentException` but proceeding without 
exception.
A glob parser would be good to be used.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-03 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR 
I've taken a deeper look at the overlap thing and found the following.

* Added an additional test which produced odd result:
```
...
  val sourcePath = "/hello/worl"
  val archivePath = "/hello/world/spark"
...
```
This has thrown `IllegalArgumentException` but the `sourcePath` is 
different than `archivePath`. 
This happens without any glob magic.

* This approach may not work if there are symlinks involved 
(`fs.makeQualified` doesn't make any link resolve).
  * HDFS does not support it yet, though on the way, see 
https://issues.apache.org/jira/browse/HADOOP-10019
  * S3 and ADLS does not support it

  All in all this part is fine now.

Checking the glob part...



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99567/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99567 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99567/testReport)**
 for PR 22952 at commit 
[`79fa3e0`](https://github.com/apache/spark/commit/79fa3e0f8b3fc2a27165c3af02df96b6d4d354cb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99567/testReport)**
 for PR 22952 at commit 
[`79fa3e0`](https://github.com/apache/spark/commit/79fa3e0f8b3fc2a27165c3af02df96b6d4d354cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-01 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@zsxwing 
Yeah, it would be ideal we can enforce `archivePath` to which don't have 
any possibility to match against source path (glob), so my approach was to find 
directory which is the base directory without having glob in ancestor, and 
`archive path + base directory of source path` doesn't belong to sub-directory 
of found directory.

For example, suppose source path is `/a/b/c/*/ef?/*/g/h/*/i`, then base 
directory of source path would be `/a/b/c`, and `archive path + base directory 
of source path` should not belong to sub-directory of `/a/b/c`.
(My code has a bug for finding the directory so need to fix it.)

This is not an elegant approach and the approach has false-positive, ending 
up restricting the archive path which actually doesn't make overlap (too 
restrict), but it would guarantee two paths never overlap. (So no need to 
re-check when renaming file.)

I guess the approach might be reasonable because in practice end users 
would avoid themselves have to think about complicated case on overlaps, and 
just isolate two paths.

What do you think about this approach?

cc. @gaborgsomogyi Could you also help validating my approach? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-29 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/22952
  
> @zsxwing Btw, how do you think about addressing background move/deletion 
(I had thought and

Yeah, this can be done in a separate ticket.

I was playing with `org.apache.hadoop.fs.GlobFilter` to see how to detect 
the overlap. But one major issue is before getting the target path, we don't 
know whether a path will match [the glob 
pattern](https://self-learning-java-tutorial.blogspot.com/2016/01/hadoop-java-globbing.html)
 or not.

The worst case, we can check the overlap when parsing the options for a 
normal path. For glob path, we can use `GlobFilter/GlobPattern` to check before 
doing the rename, in which case we can just use the target path.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99450/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99450/testReport)**
 for PR 22952 at commit 
[`f59c35a`](https://github.com/apache/spark/commit/f59c35aad5a0c2b436fe36c3116cce532fa0beee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@zsxwing Btw, how do you think about addressing background move/deletion (I 
had thought and @gaborgsomogyi also suggested as well) into separate issue? I 
guess putting more feature would let you spend more time to review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99450/testReport)**
 for PR 22952 at commit 
[`f59c35a`](https://github.com/apache/spark/commit/f59c35aad5a0c2b436fe36c3116cce532fa0beee).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@zsxwing Thanks for the detailed review! Addressed review comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR ok, feel free to ping me if review needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@gaborgsomogyi 
Thanks for taking care, but I guess I can manage it. I'll ask for help when 
I can't go back to this one.
This patch (latest change) hasn't get any feedback on committers yet so 
let's not rush on this and wait for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/22952
  
@HeartSaVioR I'm fine with this, on the other hand if you're focusing on 
different things I'm happy to create a jira + PR for the separate thread thing 
to speed up processing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99180/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99180/testReport)**
 for PR 22952 at commit 
[`007f5d5`](https://github.com/apache/spark/commit/007f5d53e7a130ae08bc36494a9f04c882e7c711).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #99180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99180/testReport)**
 for PR 22952 at commit 
[`007f5d5`](https://github.com/apache/spark/commit/007f5d53e7a130ae08bc36494a9f04c882e7c711).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@gaborgsomogyi 
Thanks for reviewing! I addressed your review comments except asynchronous 
cleanup, which might be able to break down to separated issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-22 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@gaborgsomogyi 
Yeah I also thought about the idea (commented above) but I've lost focus on 
other task. Given that smaller patch is better to be reviewed easily and 
current patch works well (except overheads on cleaning in same thread), would 
we split this up and address it to another issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98918/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #98918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98918/testReport)**
 for PR 22952 at commit 
[`33c5681`](https://github.com/apache/spark/commit/33c5681ab022116133576e4e27c50e346c1ffba9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98919/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22952
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98917/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #98917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98917/testReport)**
 for PR 22952 at commit 
[`3f6b5fb`](https://github.com/apache/spark/commit/3f6b5fbf01b2e78dfc9ecf7e3b45ef771fec74a7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-11-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22952
  
**[Test build #98919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98919/testReport)**
 for PR 22952 at commit 
[`ca26b41`](https://github.com/apache/spark/commit/ca26b4136adc09fb9015c973953b50d894fc8779).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org