GitHub user windpiger opened a pull request:

    https://github.com/apache/spark/pull/17173

    [SPARK-19832][SQL]DynamicPartitionWriteTask get partitionPath should escape 
the partition name

    ## What changes were proposed in this pull request?
    
    Currently in DynamicPartitionWriteTask, when we get the paritionPath of a 
parition, we just escape the partition value, not escape the partition name.
    
    this will cause some problems for some  special partition name situation, 
for example :
    1) if the partition name contains '%' etc,  there will be two partition 
path created in the filesytem, one is for escaped path like '/path/a%25b=1', 
another is for unescaped path like '/path/a%b=1'.
    and the data inserted stored in unescaped path, while the show partitions 
table will return 'a%25b=1' which the partition name is escaped. So here it is 
not consist. And I think the data should be stored in the escaped path in 
filesystem, which Hive2.0.0 also have the same action.
    
    2) if the partition name contains ':', there will throw exception that new 
Path("/path","a:b"), this is illegal which has a colon in the relative path.
    
    ```
    java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
path in absolute URI: a:b
      at org.apache.hadoop.fs.Path.initialize(Path.java:205)
      at org.apache.hadoop.fs.Path.<init>(Path.java:171)
      at org.apache.hadoop.fs.Path.<init>(Path.java:88)
      ... 48 elided
    Caused by: java.net.URISyntaxException: Relative path in absolute URI: a:b
      at java.net.URI.checkPath(URI.java:1823)
      at java.net.URI.<init>(URI.java:745)
      at org.apache.hadoop.fs.Path.initialize(Path.java:202)
      ... 50 more
    ```
    ## How was this patch tested?
    unit test added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark 
fixDatasourceSpecialCharPartitionName

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17173
    
----
commit deb63bd9f3586b7f9c2e203f8486d6a7eb49bc72
Author: windpiger <[email protected]>
Date:   2017-03-06T06:57:25Z

    [SPARK-19832][SQL]DynamicPartitionWriteTask get partitionPath should escape 
the partition name

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to