[jira] [Commented] (HIVE-19262) empty array will be saved as NULL by insert into select

2018-04-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447093#comment-16447093
 ] 

Gopal V commented on HIVE-19262:


You might be able to fix it in the top-level with HIVE-13632

> empty array will be saved as NULL by insert into select
> ---
>
> Key: HIVE-19262
> URL: https://issues.apache.org/jira/browse/HIVE-19262
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.1
>Reporter: liupengcheng
>Priority: Major
>
> Data is generated by MR parquet, and the data contains empty list.
> When executing the following sql, the emtpy list col of the result is 
> different from the original data.
> `insert into table a as select * from b `
> {code:java}
> >select col1 from a where size(col1) = 0 limit 1;
>  []// will show []
> >insert into table b select col1 from a;
> >select col1 from b;
>  NULL  // will show NULL
> {code}
> I was wondering if we should return the same result as before, and should not 
> change the data saved.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19262) empty array will be saved as NULL by insert into select

2018-04-21 Thread liupengcheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447085#comment-16447085
 ] 

liupengcheng commented on HIVE-19262:
-

[~gopalv] But as I see, hive ParuqetHiveSerDe will convert empty list to null, 
I don't understand why hive not just keep consistent with official Parquet 
write.

Official Parquet read/write can always keep the result unchanged.

> empty array will be saved as NULL by insert into select
> ---
>
> Key: HIVE-19262
> URL: https://issues.apache.org/jira/browse/HIVE-19262
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.1
>Reporter: liupengcheng
>Priority: Major
>
> Data is generated by MR parquet, and the data contains empty list.
> When executing the following sql, the emtpy list col of the result is 
> different from the original data.
> `insert into table a as select * from b `
> {code:java}
> >select col1 from a where size(col1) = 0 limit 1;
>  []// will show []
> >insert into table b select col1 from a;
> >select col1 from b;
>  NULL  // will show NULL
> {code}
> I was wondering if we should return the same result as before, and should not 
> change the data saved.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-19262) empty array will be saved as NULL by insert into select

2018-04-21 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-19262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446873#comment-16446873
 ] 

Gopal V commented on HIVE-19262:


This might be a problem with the Parquet file format, not Hive.

> empty array will be saved as NULL by insert into select
> ---
>
> Key: HIVE-19262
> URL: https://issues.apache.org/jira/browse/HIVE-19262
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.13.1
>Reporter: liupengcheng
>Priority: Major
>
> Data is generated by MR parquet, and the data contains empty list.
> When executing the following sql, the emtpy list col of the result is 
> different from the original data.
> `insert into table a as select * from b `
> {code:java}
> >select col1 from a where size(col1) = 0 limit 1;
>  []// will show []
> >insert into table b select col1 from a;
> >select col1 from b;
>  NULL  // will show NULL
> {code}
> I was wondering if we should return the same result as before, and should not 
> change the data saved.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)