[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-17 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao  can you backport this to branch 2.2 as well. 

thanks 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
Let me merge to master and branch 2.3. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao  and @steveloughran  thanks for your comments and review. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
LGTM. @merlintang please fix the PR title, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85931/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85931 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85931/testReport)**
 for PR 19885 at commit 
[`296a19f`](https://github.com/apache/spark/commit/296a19fc5b1881959c7cf52b3c6e33eb7fa12b57).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85931 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85931/testReport)**
 for PR 19885 at commit 
[`296a19f`](https://github.com/apache/spark/commit/296a19fc5b1881959c7cf52b3c6e33eb7fa12b57).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19885
  
LGTM. Effective use of parameterization


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
@steveloughran @vanzin please help to review again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85902/testReport)**
 for PR 19885 at commit 
[`8c5f029`](https://github.com/apache/spark/commit/8c5f029b0b2e4dbb9912db6ce44eb6ff0ec31f6c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85902/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85902 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85902/testReport)**
 for PR 19885 at commit 
[`8c5f029`](https://github.com/apache/spark/commit/8c5f029b0b2e4dbb9912db6ce44eb6ff0ec31f6c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85897/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85897/testReport)**
 for PR 19885 at commit 
[`778e1ef`](https://github.com/apache/spark/commit/778e1ef903d46a46f3a389fc9b5bf8038ac7cb71).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85897/testReport)**
 for PR 19885 at commit 
[`778e1ef`](https://github.com/apache/spark/commit/778e1ef903d46a46f3a389fc9b5bf8038ac7cb71).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85889 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85889/testReport)**
 for PR 19885 at commit 
[`31cfb88`](https://github.com/apache/spark/commit/31cfb88acfd17a3ce4481fcf32f6f2470a932bc1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85889/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #85889 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85889/testReport)**
 for PR 19885 at commit 
[`31cfb88`](https://github.com/apache/spark/commit/31cfb88acfd17a3ce4481fcf32f6f2470a932bc1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@steveloughran can you review the added system test cases? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-02 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
My local test is ok. I  would set up a system test and update this soon.
sorry about this delay.

On Tue, Jan 2, 2018 at 3:42 PM, Marcelo Vanzin 
wrote:

> Any updates?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-02 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19885
  
Any updates?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-14 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
I am so sorry for the late of testing function, I would update it soon.

On Thu, Dec 14, 2017 at 12:55 PM, UCB AMPLab 
wrote:

> Can one of the admins verify this patch?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-11 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19885
  
I think everybody is still waiting for the tests to be added to the PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-07 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19885
  
I'd recommend the tests are parameterized, generating a separate test for 
each URI pair, and including the values on a failure. Plan for a future where 
all you have is a stack trace from jenkins and you want to work out what failed.

Because Scalatest's `test(name: String)(testFun: => Any)` is not a function 
definition, just a method which registers a test, you can apply it to a 
sequence of path pairs, so generate a unique test for each one, something like

```scala
  val matching = Seq(
("files", "file:///file1", "file:///file2"),
("hdfs", "hdfs:/path1", "hdfs:/path1")
  )

  matching.foreach {
t =>
  test(t._1) {
assert(Client.compareUri(new URI(t._2), new URI(t._3)),
  s"No match between ${t.2} and ${t._3}")
  }
  }

```

+same for non-matching sets, asserting that the comparison is false



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #84594 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84594/testReport)**
 for PR 19885 at commit 
[`dfec0ac`](https://github.com/apache/spark/commit/dfec0ac812b690b79f416806409d96a65f4d9fe7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84594/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
I have added this test case for the URI comparing based on Steve's 
comments. I have tested this in my local vm, it pass the test. 

meanwhile, for the hdfs://namenode1/path1 hdfs://namenode1:8020/path2  , 
the default port number of hdfs can be got. thus, they also matched. 

below is the test case:

test("compare URI for filesystem") {

//case 1
var srcUri = new URI("file:///file1")
var dstUri = new URI("file:///file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 2
srcUri = new URI("file:///c:file1")
dstUri = new URI("file://c:file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 3
srcUri = new URI("file://host/file1")
dstUri = new URI("file://host/file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 4
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket1@user/")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 5
srcUri = new URI("hdfs:/path1")
dstUri = new URI("hdfs:/path2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 6
srcUri = new URI("file:///file1")
dstUri = new URI("file://host/file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 7
srcUri = new URI("file://host/file1")
dstUri = new URI("file:///file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 8
srcUri = new URI("file://host/file1")
dstUri = new URI("file://host2/file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 9
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket2@user/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 10
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket1@user2/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 11
srcUri = new URI("s3a://user@pass:bucket1/")
dstUri = new URI("s3a://user2@pass2:bucket1/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 12
srcUri = new URI("hdfs://namenode1/path1")
dstUri = new URI("hdfs://namenode1:8080/path2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 13
srcUri = new URI("hdfs://namenode1:8020/path1")
dstUri = new URI("hdfs://namenode1:8080/path2")
assert(Client.compareUri(srcUri, dstUri) == false)
  }




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #84594 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84594/testReport)**
 for PR 19885 at commit 
[`dfec0ac`](https://github.com/apache/spark/commit/dfec0ac812b690b79f416806409d96a65f4d9fe7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
I see. Thanks for the explanation @steveloughran . My concern is that 
current changes will affect all the filesystems, but we only saw this issue in 
wasb. So limiting authority comparison to only wasb will be safer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19885
  
if you make a path of each of these and call getFileSystem() on them, you 
will end up with two different FS instances in the same JVM. But they'll both 
be talking to the same namenode using the UGI of whoever was the current user 
at the time getFileSystem() was called. That is: one cluster

* Nobody should be using user@ in an HDFS URL, it doesn't do anything
* for better or worse, it does in wasb (I wish they'd just used the 
hostname the way the others do)
* S3 does accept user:pass to put your full credentials in, but you get 
told off for doing this (it gets logged in too many places), and at some point 
we'll turn it off. Users shouldn't be doing it.

If someone really does refer to a source JAR with a user1@hdfs:// and the 
HDFS filesystem doesn't have a user --why not treat the FS as different and 
don't worry about how the filesystem interprets it. It's a special case you 
aren't normally going to see. And the moment you try to go fs.makeQualified() 
between the two, you'll get a get a stack trace. 

That is, this is not valid:
```
new Path("hdfs://us...@nn1.com:8020").getFileSystem(conf).open(new 
Path("hdfs://us...@nn1.com:8020"))
```

You'll inevitably get a stack trace in makeQualified. (non normative 
statement, try it and see)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
>User info isn't picked up from the URL, it's taken off your Kerberos 
credentials. If you are running HDFS unkerberized, then UGI takes it from the 
environment variable HADOOP_USER_NAME.

I understand that userInfo is not picked from URL. But here in this PR it 
tries to use authority to compare to filesystems, and authority includes 
userInfo. So based on the code here,  `hdfs://us...@nn1.com:8020` and 
`hdfs://us...@nn1.com:8020` are obviously two filesystems, but is that true 
this two URIs belongs to two actual hdfs clusters? Or they're just two local 
filesystem objects?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19885
  
@vanzin its too late for this, but I don't see any reason why 
`FileSystem.getCanonicalUri` should be kept protected. If someone wants to 
volunteer with the spec changes to filesystem.md & contract tests, they'll get 
support.

Looking at what HDFS does there, it calls out HA support as special: you 
can't do DNS resolution
```java
  protected URI canonicalizeUri(URI uri) {
if (HAUtilClient.isLogicalUri(getConf(), uri)) {
  // Don't try to DNS-resolve logical URIs, since the 'authority'
  // portion isn't a proper hostname
  return uri;
} else {
  return NetUtils.getCanonicalUri(uri, getDefaultPort());
}
  }
```

where `NetUtils.getCanonicalUri()` does some DNS lookup with caching of 
previously canonicalized hosts via {{SecurityUtil.getByName}}. SecurityUtil is 
tagged as `@Public`; NetUtils isn't, but that could be relaxed while nobody is 
looking. But it doesn't address the big issue: different filesystems clearly 
have different rules about "canonical", and you don't want to try and work them 
out and re-replicate, as it is a moving-maintenance-target.

I'm stuck at this point. Created 
[HADOOP-15094](https://issues.apache.org/jira/browse/HADOOP-15094). 
Looking at {{Filesystem.CACHE}}; that compares on: (scheme, authority, 
ugi), so it will actually return different FS instances for unqualified and 
qualified hosts. Maybe for this specific problem it's simplest to say "if you 
do that, don't expect things to work"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19885
  
User info isn't picked up from the URL, it's taken off your Kerberos 
credentials. If you are running HDFS unkerberized, then UGI takes it from the 
environment variable `HADOOP_USER_NAME`. 

Looking at the `DFSClient` code, it's done in

```
this.ugi = UserGroupInformation.getCurrentUser();
```

Also looking at that code, I'm not convinced that using user@hdfs wouldn't 
up causing confusion, especially in the world of HA HDFS. You'd really need to 
experiment to see if you could break things.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-05 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
I still have a question about it, URIs for HDFS like 
`hdfs://us...@nn1.com:8020` and `hdfs://us...@nn1.com:8020` , do we honor 
userInfo for HDFS filesystems, are they two HDFS clusters, or just two FS cache 
object in local client side? @merlintang did you try this in your local test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-05 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19885
  
Switching to comparing URIs should be ok if there's an easy way to 
canonicalize them. There's `FileSystem.getCanonicalUri`, but it's a protected 
method.

Otherwise it might be tricky to support all combinations (e.g. 
"hdfs://somehost/path" should be equal to "hdfs://somehost.mydomain.com/path").


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-05 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/19885
  
Hi. 
If the comparision is isolated to a method testing  URIs, rather than 
filesystems, it should be straightforward to write a suite of tests for this, 
with lists of URIs expected to match, as separate one of those to fail
That way we can review those combinations which people expect to 
match/don't match & see they meet our expectations, plus have somewhere to put 
new variants over time.

So: do that test, then we can see if the code does what's needed. Once 
that's done I'll use it as a basis for defining what Path is meant to do in the 
Hadoop FS spec & tests.

Things to check
```
file:///file1 file:///file 2   : match; no auth
file:///c:file1 file://c:file2  match, windows cruft. This is the bit of 
Path which is most trouble
file://host/file1 file://host/file2 
wasb://bucket1@user wasb://bucket1@user/  
hdfs:/path1 hdfs:/path2   -- "default" FS; may be patched by the time you 
get to FileSystem.getURI
hdfs://namenode1/path1 hdfs://namenode1:8020/path2-using default port. 
I think by the time you ask the filesystem for this (FileSystem.getURI() this 
may have been patched up)

```

no match:
```
file:///file1 file://host/file2  :no auth in src URI (sean's problem)
file://host/file1 file:///file2
file://host/file1 file://host2/file2 
wasb://bucket1@user wasb://bucket2@user/  
wasb://bucket1@user wasb://bucket1@user2/  
s3a://user@pass:bucket1/  s3a://user2@pass2:bucket1/   (we do a bit of 
secret stripping in S3A, so this may end up working in real life. Could relax 
that to retaining user@ though, if we retain it at all)
hdfs:/path1 hdfs:/path2
hdfs://namenode1/path1 hdfs://namenode1:8080/path2  
hdfs://namenode1:8020/path1 hdfs://namenode1:8080/path2
 ```


See? It's complex. Add the parameterised test and then it becomes easier to 
review/maintain & be confident those corner cases are being handled


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #84455 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84455/testReport)**
 for PR 19885 at commit 
[`8bbbf04`](https://github.com/apache/spark/commit/8bbbf046a8f706add8430dc6d34c551d87c4d5a1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84455/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19885
  
**[Test build #84455 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84455/testReport)**
 for PR 19885 at commit 
[`8bbbf04`](https://github.com/apache/spark/commit/8bbbf046a8f706add8430dc6d34c551d87c4d5a1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
Is this assumption based on the implementation of Hadoop `FileSystem`? I 
was thinking that wasb is an exception, for other we still keep the original 
code.

@steveloughran would you please comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao  yes, hdfs://us...@nn1.com:8020 and hdfs://us...@nn1.com:8020 
would consider as two filesystem, since the authority information should be 
taken into consideration. that is why need to add the authority to check two 
FS. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
@vanzin please help to review, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19885
  
@merlintang would you please add the problem to your PR description, 
currently it is a WASB problem in which userInfo is honored to differentiate 
filesystems. Please add the scenario to the description.

Besides the changes here will also affect all other filesystems like 
HDFS/S3, do they still have same behaviors? What will be happened if 
`hdfs://us...@nn1.com:8020` and `hdfs://us...@nn1.com:8020`, are they two 
filesystems? Would you please clarify?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19885
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao can you review this patch? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org