[ 
https://issues.apache.org/jira/browse/HIVE-14989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov resolved HIVE-14989.
--------------------------------------
       Resolution: Duplicate
    Fix Version/s: 0.14.1

> FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte
> ----------------------------------------------------------------------
>
>                 Key: HIVE-14989
>                 URL: https://issues.apache.org/jira/browse/HIVE-14989
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats, Parser, Reader
>    Affects Versions: 0.13.0, 0.13.1
>            Reporter: Ruslan Dautkhanov
>             Fix For: 0.14.1
>
>
> FIELDS TERMINATED BY parsing broken when delimiter is more than 1 byte. 
> Delimiter starting from 2nd character becomes part of returned data. No 
> parsed properly.
> Test case:
> {noformat}
> CREATE external TABLE test_muldelim
> (  string1 STRING,
>    string2 STRING,
>    string3 STRING
> )
>  ROW FORMAT 
>        DELIMITED FIELDS TERMINATED BY '<>'
>       LINES TERMINATED BY '\n'
>  STORED AS TEXTFILE
>   location '/user/hive/test_muldelim'
> {noformat}
> Create a text file under /user/hive/test_muldelim with following 2 lines:
> {noformat}
> data1<>data2<>data3
> aa<>bb<>cc
> {noformat}
> Now notice that two-character delimiter wasn't parsed properly:
> {noformat}
> jdbc:hive2://host.domain.com:1> select * from ruslan_test.test_muldelim ;
> +------------------------+------------------------+------------------------+--+
> | test_muldelim.string1  | test_muldelim.string2  | test_muldelim.string3  |
> +------------------------+------------------------+------------------------+--+
> | data1                  | >data2                 | >data3                 |
> | aa                     | >bb                    | >cc                    |
> +------------------------+------------------------+------------------------+--+
> 2 rows selected (0.453 seconds)
> {noformat}
> The second delimiter's character ('>') became part of the columns to the 
> right (`string2` and `string3`).
> Table DDL:
> {noformat}
> 0: jdbc:hive2://host.domain.com:1> show create table dafault.test_muldelim ;
> +-----------------------------------------------------------------+--+
> |                         createtab_stmt                          |
> +-----------------------------------------------------------------+--+
> | CREATE EXTERNAL TABLE `default.test_muldelim`(              |
> |   `string1` string,                                             |
> |   `string2` string,                                             |
> |   `string3` string)                                             |
> | ROW FORMAT DELIMITED                                            |
> |   FIELDS TERMINATED BY '<>'                                     |
> |   LINES TERMINATED BY '\n'                                      |
> | STORED AS INPUTFORMAT                                           |
> |   'org.apache.hadoop.mapred.TextInputFormat'                    |
> | OUTPUTFORMAT                                                    |
> |   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'  |
> | LOCATION                                                        |
> |   'hdfs://epsdatalake/user/hive/test_muldelim'              |
> | TBLPROPERTIES (                                                 |
> |   'transient_lastDdlTime'='1476727100')                         |
> +-----------------------------------------------------------------+--+
> 15 rows selected (0.286 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to