southernriver opened a new pull request #30707:
URL: https://github.com/apache/spark/pull/30707
### What changes were proposed in this pull request?
In the distributed hdfs storage system,Space and other special character are
allowed in the path:
>
hdfs://ns1/tmp2/hive-staging/hadoop_hive_2020-07-06_17-31-29_139_7042265710400397740-1/-ext-10000/test_table=2020-06-17
18%3A00%3A00/part-00000-84396c4e-ba05-4936-afc7-db46c4251bfa.c000
When we load data by using
```
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
org.apache.spark.sql.execution.datasources.orcOrcFileFormat.scala
org.apache.spark.sql.hive.orc.OrcFileFormat
```
, exception may throw as below:
```
Caused by: java.net.URISyntaxException: Illegal character in path at index
136:
hdfs://ns1/tmp2/hive-staging/hadoop_hive_2020-07-06_17-31-29_139_7042265710400397740-1/-ext-10000/test_table=2020-06-17
18%3A00%3A00/part-00000-84396c4e-ba05-4936-afc7-db46c4251bfa.c000
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3105)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.<init>(URI.java:588)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:356)atorg.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
anonfunbuildReaderWithPartitionValues1.apply(ParquetFileFormat.scala:352)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.orgapachesparksqlexecutiondatasourcesFileScanRDD
anon
readCurrentFile(FileScanRDD.scala:124)
at org.apache.spark.sql.execution.datasources.FileScanRDD
anon$1.nextIterator(FileScanRDD.scala:177)atorg.apache.spark.sql.execution.datasources.FileScanRDD
anon1.hasNext(FileScanRDD.scala:101)atorg.apache.spark.sql.execution.datasources.FileFormatWriteranonfunorgapachesparksqlexecutiondatasourcesFileFormatWriter
executeTask$3.apply(FileFormatWriter.scala:252)atorg.apache.spark.sql.execution.datasources.FileFormatWriter
anonfunorgapachesparksqlexecutiondatasourcesFileFormatWriterexecuteTask3.apply(FileFormatWriter.scala:250)
at
org.apache.spark.util.Utils.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)atorg.apache.spark.sql.execution.datasources.FileFormatWriter.orgapachesparksqlexecutiondatasourcesFileFormatWriter$$executeTask(FileFormatWriter.scala:256)
... 10 more
```
Hdfs has provided serveral construct function to build path:
https://github.com/apache/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Path.java
We could fall back to construct a path from a String rather than URI.
### Why are the changes needed?
It's reasonable to support all path of HDFS for module of ParquetFileFormat
or OrcFileFormat.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
manual
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]