GitHub user windpiger opened a pull request:
https://github.com/apache/spark/pull/17173
[SPARK-19832][SQL]DynamicPartitionWriteTask get partitionPath should escape
the partition name
## What changes were proposed in this pull request?
Currently in DynamicPartitionWriteTask, when we get the paritionPath of a
parition, we just escape the partition value, not escape the partition name.
this will cause some problems for some special partition name situation,
for example :
1) if the partition name contains '%' etc, there will be two partition
path created in the filesytem, one is for escaped path like '/path/a%25b=1',
another is for unescaped path like '/path/a%b=1'.
and the data inserted stored in unescaped path, while the show partitions
table will return 'a%25b=1' which the partition name is escaped. So here it is
not consist. And I think the data should be stored in the escaped path in
filesystem, which Hive2.0.0 also have the same action.
2) if the partition name contains ':', there will throw exception that new
Path("/path","a:b"), this is illegal which has a colon in the relative path.
```
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative
path in absolute URI: a:b
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.fs.Path.<init>(Path.java:88)
... 48 elided
Caused by: java.net.URISyntaxException: Relative path in absolute URI: a:b
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 50 more
```
## How was this patch tested?
unit test added
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/windpiger/spark
fixDatasourceSpecialCharPartitionName
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17173.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17173
----
commit deb63bd9f3586b7f9c2e203f8486d6a7eb49bc72
Author: windpiger <[email protected]>
Date: 2017-03-06T06:57:25Z
[SPARK-19832][SQL]DynamicPartitionWriteTask get partitionPath should escape
the partition name
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]