Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/19885
  
    Hi. 
    If the comparision is isolated to a method testing  URIs, rather than 
filesystems, it should be straightforward to write a suite of tests for this, 
with lists of URIs expected to match, as separate one of those to fail
    That way we can review those combinations which people expect to 
match/don't match & see they meet our expectations, plus have somewhere to put 
new variants over time.
    
    So: do that test, then we can see if the code does what's needed. Once 
that's done I'll use it as a basis for defining what Path is meant to do in the 
Hadoop FS spec & tests.
    
    Things to check
    ```
    file:///file1 file:///file 2   : match; no auth
    file:///c:file1 file://c:file2  match, windows cruft. This is the bit of 
Path which is most trouble
    file://host/file1 file://host/file2 
    wasb://bucket1@user wasb://bucket1@user/  
    hdfs:/path1 hdfs:/path2   -- "default" FS; may be patched by the time you 
get to FileSystem.getURI
    hdfs://namenode1/path1 hdfs://namenode1:8020/path2    -using default port. 
I think by the time you ask the filesystem for this (FileSystem.getURI() this 
may have been patched up)
    
    ```
    
    no match:
    ```
    file:///file1 file://host/file2  :no auth in src URI (sean's problem)
    file://host/file1 file:///file2
    file://host/file1 file://host2/file2 
    wasb://bucket1@user wasb://bucket2@user/  
    wasb://bucket1@user wasb://bucket1@user2/  
    s3a://user@pass:bucket1/  s3a://user2@pass2:bucket1/   (we do a bit of 
secret stripping in S3A, so this may end up working in real life. Could relax 
that to retaining user@ though, if we retain it at all)
    hdfs:/path1 hdfs:/path2
    hdfs://namenode1/path1 hdfs://namenode1:8080/path2  
    hdfs://namenode1:8020/path1 hdfs://namenode1:8080/path2
     ```
    
    
    See? It's complex. Add the parameterised test and then it becomes easier to 
review/maintain & be confident those corner cases are being handled


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to