dongjoon-hyun commented on pull request #32184:
URL: https://github.com/apache/spark/pull/32184#issuecomment-820459844


   > @dongjoon-hyun looked into that PR, it provided a Http server. This PR 
aims to support the remote file using HDFS api, seems we can not add a useful 
test.
   > 
   > Update the pr title to make it clear.
   
   
   Hi, @ulysses-you . It seems that you are underestimating your PR's 
contribution. :)
   Hadoop API is an abstraction layer to support various file systems like the 
following.
   
   ```scala
   scala> new 
org.apache.hadoop.fs.Path("file:///tmp/README.md").getFileSystem(sc.hadoopConfiguration)
   res0: org.apache.hadoop.fs.FileSystem = 
org.apache.hadoop.hive.ql.io.ProxyLocalFileSystem@1def2d16
   
   scala> new 
org.apache.hadoop.fs.Path("https://spark.apache.org/index.html";).getFileSystem(sc.hadoopConfiguration)
   res1: org.apache.hadoop.fs.FileSystem = 
org.apache.hadoop.fs.http.HttpsFileSystem@4058b398
   
   scala> new 
org.apache.hadoop.fs.Path("s3a://dongjoon/README.md").getFileSystem(sc.hadoopConfiguration)
   res2: org.apache.hadoop.fs.FileSystem = S3AFileSystem{uri=s3a://dongjoon,...
   ```
   
   Have you try to put your pool file on other file systems, HTTP Webserver or 
S3 (or S3-compatible MinIO)? Do you mean it doesn't work for you, @ulysses-you ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to