[jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-08-29 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
Status: Patch Available  (was: In Progress)

IMHO, the ArrayList.ensureCapacity  does not clear all the data of previous row.
  When the size of array of current row is less than that of previous row, 
it data of list will not be fully   overwrite and the not overwrite data will 
be output.

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Assignee: Zhizhen Hou
>Priority: Critical
>  Labels: patch
> Attachments: HIVE-16332.1.patch
>
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-08-29 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
Attachment: HIVE-16332.1.patch

IMHO, the ArrayList.ensureCapacity  does not clear all the data of previous row.
  When the size of array of current row is less than that of previous row, 
it data of list will not be fully  overwrite and the not overwrite data will be 
output.

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Assignee: Zhizhen Hou
>Priority: Critical
>  Labels: patch
> Attachments: HIVE-16332.1.patch
>
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-08-29 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
Status: Open  (was: Patch Available)

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Assignee: Zhizhen Hou
>Priority: Critical
>  Labels: patch
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-08-29 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
  Labels: patch  (was: )
Assignee: Zhizhen Hou
Tags: orc array
  Status: Patch Available  (was: Open)

 IMHI, the ArrayList.ensureCapacity  does not clear all the data of previous 
row.
 When the size of array of current row is less than that of previous row, it 
data of list will not be fully overwrite and the not overwrite data will be 
output. 
  

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Assignee: Zhizhen Hou
>Priority: Critical
>  Labels: patch
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16332) When create a partitioned text format table with one partition, after we change the format of table to orc, then the array type field may output error.

2017-06-14 Thread Zhizhen Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated HIVE-16332:
---
Summary: When create a partitioned text format table with one partition, 
after we change the format of table to orc, then the array type field may 
output error.  (was: We create a partitioned text format table with one 
partition, after we change the format of table to orc, then the array type 
field may output error.)

> When create a partitioned text format table with one partition, after we 
> change the format of table to orc, then the array type field may output error.
> ---
>
> Key: HIVE-16332
> URL: https://issues.apache.org/jira/browse/HIVE-16332
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Affects Versions: 2.1.1
>Reporter: Zhizhen Hou
>Priority: Critical
>
> ##The step to reproduce the result.
> 1. First crate a text format table with array type field in hive.
> ```
>  create table test_text_orc (
>   col_int bigint,
>   col_text string, 
>   col_array array, 
>   col_map map
>   ) 
>   PARTITIONED BY (
>day string
>)
>ROW FORMAT DELIMITED
>  FIELDS TERMINATED BY ',' 
>  collection items TERMINATED  BY ']'
>  map keys TERMINATED BY ':'
>   ;
>  
> ```
> 2. Create new text file hive-orc-text-file-array-error-test.txt.
> ```
> 1,text_value1,array_value1]array_value2]array_value3, 
> map_key1:map_value1,map_key2:map_value2
> 2,text_value2,array_value4, map_key1:map_value3
> ,text_value3,, map_key1:]map_key3:map_value3
> ```
> 3.  Load the data into one partition.
> ```
>  LOAD DATA local INPATH '.hive-orc-text-file-array-error-test.txt' overwrite 
> into table test_text_orc partition(day=20170329)
> ```
> 4. select the data to verify the result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4"]{"map_key1":"map_value3"}   
> 20170329
> NULL  text_value3 []  {" map_key1":"","map_key3":"map_value3"}
> 20170329
> ```
> 5. Alter table format of table to orc;
> ```
>  alter table test_text_orc set fileformat orc;
> ```
> 6. Check the result again, and you can see the  error result.
> ```
> hive> select * from test.test_text_orc;
> OK
> 1 text_value1 ["array_value1","array_value2","array_value3"]  {" 
> map_key1":"map_value1","map_key2":"map_value2"}  20170329
> 2 text_value2 ["array_value4","array_value2","array_value3"]  
> {"map_key1":"map_value3"}   20170329
> NULL  text_value3 ["array_value4","array_value2","array_value3"]  
> {"map_key3":"map_value3"," map_key1":""}20170329
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)