[ 
https://issues.apache.org/jira/browse/SPARK-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Song Jun updated SPARK-19332:
-----------------------------
    Description: 
SPARK-19257 ‘s work is to change the type of  `CatalogStorageFormat` 's 
locationUri to `URI`, while it has some problem:

1.`CatalogTable` and `CatalogTablePartition` use the same class 
`CatalogStorageFormat`
2. the type URI is ok for `CatalogTable`, but it is not proper for 
`CatalogTablePartition`
3. the location of a table partition can contains a not encode whitespace, so 
  if a partition location contains this not encode whitespace, and it will 
throw an exception for URI. for example `/path/2014-01-01 00%3A00%3A00` is a 
partition location which has whitespace

so if we change the type to URI, it is bad for `CatalogTablePartition`

and I found Hive has the same issue HIVE-6185
before hive 0.13 the location is URI, while after above PR, it change it to 
Path, and do some check when DDL.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L1553
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3732

so I think ,we can do the URI check for the table's location , and it is not 
proper to change the type to URI.


  was:
~SPARK-19257 ‘s work is to change the type of  `CatalogStorageFormat` 's 
locationUri to `URI`, while it has some problem:

1.`CatalogTable` and `CatalogTablePartition` use the same class 
`CatalogStorageFormat`
2. the type URI is ok for `CatalogTable`, but it is not proper for 
`CatalogTablePartition`
3. the location of a table partition can contains a not encode whitespace, so 
  if a partition location contains this not encode whitespace, and it will 
throw an exception for URI. for example `/path/2014-01-01 00%3A00%3A00` is a 
partition location which has whitespace

so if we change the type to URI, it is bad for `CatalogTablePartition`

and I found Hive has the same issue ~HIVE-6185
before hive 0.13 the location is URI, while after above PR, it change it to 
Path, and do some check when DDL.

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L1553
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3732

so I think ,we can do the URI check for the table's location , and it is not 
proper to change the type to URI.



> table's location should check if a URI is legal
> -----------------------------------------------
>
>                 Key: SPARK-19332
>                 URL: https://issues.apache.org/jira/browse/SPARK-19332
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Song Jun
>
> SPARK-19257 ‘s work is to change the type of  `CatalogStorageFormat` 's 
> locationUri to `URI`, while it has some problem:
> 1.`CatalogTable` and `CatalogTablePartition` use the same class 
> `CatalogStorageFormat`
> 2. the type URI is ok for `CatalogTable`, but it is not proper for 
> `CatalogTablePartition`
> 3. the location of a table partition can contains a not encode whitespace, so 
>   if a partition location contains this not encode whitespace, and it will 
> throw an exception for URI. for example `/path/2014-01-01 00%3A00%3A00` is a 
> partition location which has whitespace
> so if we change the type to URI, it is bad for `CatalogTablePartition`
> and I found Hive has the same issue HIVE-6185
> before hive 0.13 the location is URI, while after above PR, it change it to 
> Path, and do some check when DDL.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java#L1553
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3732
> so I think ,we can do the URI check for the table's location , and it is not 
> proper to change the type to URI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to