dongjoon-hyun commented on pull request #32184:
URL: https://github.com/apache/spark/pull/32184#issuecomment-820459844
> @dongjoon-hyun looked into that PR, it provided a Http server. This PR
aims to support the remote file using HDFS api, seems we can not add a useful
test.
>
> Update the pr title to make it clear.
Hi, @ulysses-you . It seems that you are underestimating your PR's
contribution. :)
Hadoop API is an abstraction layer to support various file systems like the
following.
```scala
scala> new
org.apache.hadoop.fs.Path("file:///tmp/README.md").getFileSystem(sc.hadoopConfiguration)
res0: org.apache.hadoop.fs.FileSystem =
org.apache.hadoop.hive.ql.io.ProxyLocalFileSystem@1def2d16
scala> new
org.apache.hadoop.fs.Path("https://spark.apache.org/index.html").getFileSystem(sc.hadoopConfiguration)
res1: org.apache.hadoop.fs.FileSystem =
org.apache.hadoop.fs.http.HttpsFileSystem@4058b398
scala> new
org.apache.hadoop.fs.Path("s3a://dongjoon/README.md").getFileSystem(sc.hadoopConfiguration)
res2: org.apache.hadoop.fs.FileSystem = S3AFileSystem{uri=s3a://dongjoon,...
```
Have you try to put your pool file on other file systems, HTTP Webserver or
S3 (or S3-compatible MinIO)? Do you mean it doesn't work for you, @ulysses-you ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]