[jira] [Commented] (HIVE-15475) JsonSerDe cannot handle json file with empty lines

2019-02-24 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776298#comment-16776298
 ] 

BELUGA BEHR commented on HIVE-15475:


Nope. OK.  Figured it out.

This issue was inadvertently fixed as part of [HIVE-18545] (Jul 10, 2018).  
Previous to this change, the JSON stuff was handled by 
{{org.apache.hive.hcatalog.data.JsonSerDe}}

The issue was that this class was not handling the provided {{Text}} object 
correctly.  The {{Text}} object has two components to it: an internal array of 
bytes *and* a size that indicates which bytes are to be processed.  Well, 
{{JsonSerde}} was not taking into account the size, so, when a zero-length 
{{Text}} object was submitted, it would still look at the entire internal byte 
array, ignoring the zero size, and produce duplicates where there should be no 
text.

https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java#L168

> JsonSerDe cannot handle json file with empty lines
> --
>
> Key: HIVE-15475
> URL: https://issues.apache.org/jira/browse/HIVE-15475
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: pin_zhang
>Priority: Major
>
> 1. start HiveServer2 in apache-hive-1.2.1
> 2 start a beeline connect to hive server2
>   ADD JAR  ADD JAR 
> /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar
>  ;
>CREATE external TABLE my_table(a string, b bigint)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS TEXTFILE
> location 'file:///home/hive/json';
> 3 put a file with more than one new lines at the end of the file
> {"a":"a_1", "b" : 1}
> 4 run sql 
> select * from my_table ;
> +-+-+--+
> | my_table.a  | my_table.b  |
> +-+-+--+
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> +-+-+--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15475) JsonSerDe cannot handle json file with empty lines

2019-02-23 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775965#comment-16775965
 ] 

BELUGA BEHR commented on HIVE-15475:


I've been digging into this as part of HIVE-21240.

I'm pretty sure that this is related to [MAPREDUCE-6549], [MAPREDUCE-6481], 
[MAPREDUCE-6558] which have all been fixed in Hadoop 2.6.3/2.6.5

However, Hive 2.1 uses Hadoop 2.6.1:

https://github.com/apache/hive/blob/rel/release-2.1.1/pom.xml#L135

You have to use Hive 2.2.1 or higher:


https://github.com/apache/hive/blob/rel/release-2.2.0/pom.xml#L141


> JsonSerDe cannot handle json file with empty lines
> --
>
> Key: HIVE-15475
> URL: https://issues.apache.org/jira/browse/HIVE-15475
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
>Reporter: pin_zhang
>Priority: Major
>
> 1. start HiveServer2 in apache-hive-1.2.1
> 2 start a beeline connect to hive server2
>   ADD JAR  ADD JAR 
> /home/apache-hive-1.2.1-bin/hcatalog/share/hcatalog/hive-hcatalog-core-1.2.1.jar
>  ;
>CREATE external TABLE my_table(a string, b bigint)
> ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
> STORED AS TEXTFILE
> location 'file:///home/hive/json';
> 3 put a file with more than one new lines at the end of the file
> {"a":"a_1", "b" : 1}
> 4 run sql 
> select * from my_table ;
> +-+-+--+
> | my_table.a  | my_table.b  |
> +-+-+--+
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> | a_1 | 1   |
> +-+-+--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)