[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2018-04-05 Thread Jelmer Kuperus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426938#comment-16426938
 ] 

Jelmer Kuperus commented on HIVE-14044:
---

It seems that the problem is with the result format. Either setting

    set hive.query.result.fileformat = SequenceFile;

or 

    set hive.fetch.task.conversion=more;

Worked as a workaround for me

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2018-04-04 Thread Jelmer Kuperus (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425849#comment-16425849
 ] 

Jelmer Kuperus commented on HIVE-14044:
---

[~Sh4pe] I think that's only for LazySimpleSerDe

If i look at this code it declares the SERIALIZATION_ESCAPE_CRLF property

[https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java#L69]

But the avro one doesn't

[https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java#L46]

Specifying it on the table does absolutely nothing for me on CDH-5.9.2

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2017-03-23 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938844#comment-15938844
 ] 

Sahil Takiar commented on HIVE-14044:
-

Thanks for the pointer Anthony. [~Sh4pe] if you can check to see if HIVE-11785 
fixes your issue that would be great.

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2017-01-06 Thread Anthony Hsu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806417#comment-15806417
 ] 

Anthony Hsu commented on HIVE-14044:


I believe this issue was fixed by HIVE-11785.

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2016-07-06 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365069#comment-15365069
 ] 

Sahil Takiar commented on HIVE-14044:
-

Is this still an issue for you? Have you seen this bug come up again?

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2016-07-06 Thread David Nies (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364005#comment-15364005
 ] 

David Nies commented on HIVE-14044:
---

Sadly, no. I just checked the Hive version:

{code}
$ hiveserver2 --version
Hive 1.1.0-cdh5.5.1
{code}

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2016-07-05 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363512#comment-15363512
 ] 

Sahil Takiar commented on HIVE-14044:
-

[~Sh4pe] is there anymore environment information you can provided?

I tried to re-produce this on CDH 5.5.1 but I can't reproduce is there either.

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values

2016-07-05 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363467#comment-15363467
 ] 

Sahil Takiar commented on HIVE-14044:
-

I checked and this issue is no longer present in the master branch.

The query: {{select foo, map_keys(bar), map_values(bar) from 
broken_newline_map}}

Prints:

{code}
+--+--+-+--+
| foo  |c1| c2  |
+--+--+-+--+
| 1| ["key1","key2"]  | ["value1\nafter newline","value2"]  |
| 2| ["key1","key2"]  | ["new value","new value2"]  |
+--+--+-+--+
{code}

> Newlines in Avro maps cause external table to return corrupt values
> ---
>
> Key: HIVE-14044
> URL: https://issues.apache.org/jira/browse/HIVE-14044
> Project: Hive
>  Issue Type: Bug
> Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera 
> 5.5.1)
>Reporter: David Nies
>Assignee: Sahil Takiar
>Priority: Critical
> Attachments: test.json, test.schema
>
>
> When {{\n}} characters are contained in Avro files that are used as data 
> bases for an external table, the result of {{SELECT}} queries may be corrupt. 
> I encountered this error when querying hive both from {{beeline}} and from 
> JDBC.
> h3. Steps to reproduce (used files are attached to ticket)
> # Create an {{.avro}} file that contains newline characters in a value of a 
> map:
> {code}
> avro-tools fromjson --schema-file test.schema test.json > test.avro
> {code}
> # Copy {{.avro}} file to HDFS
> {code}
> hdfs dfs -copyFromLocal test.avro /some/location/
> {code}
> # Create an external table in beeline containing this {{.avro}}:
> {code}
> beeline> CREATE EXTERNAL TABLE broken_newline_map
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> LOCATION '/some/location/'
> TBLPROPERTIES ('avro.schema.literal'='
> {
>   "type" : "record",
>   "name" : "myEntry",
>   "namespace" : "myNamespace",
>   "fields" : [ {
> "name" : "foo",
> "type" : "long"
>   }, {
> "name" : "bar",
> "type" : {
>   "type" : "map",
>   "values" : "string"
> }
>   } ]
> }
> ');
> {code}
> # Now, selecting may return corrupt results:
> {code}
> jdbc:hive2://my-server:1/> select * from broken_newline_map;
> +-+---+--+
> | broken_newline_map.foo  |  broken_newline_map.bar   
> |
> +-+---+--+
> | 1   | {"key2":"value2","key1":"value1\nafter newline"}  
> |
> | 2   | {"key2":"new value2","key1":"new value"}  
> |
> +-+---+--+
> 2 rows selected (1.661 seconds)
> jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) 
> from broken_newline_map;
> +---+--+-+--+
> |  foo  |   _c1| _c2 |
> +---+--+-+--+
> | 1 | ["key2","key1"]  | ["value2","value1"] |
> | NULL  | NULL | NULL|
> | 2 | ["key2","key1"]  | ["new value2","new value"]  |
> +---+--+-+--+
> 3 rows selected (28.05 seconds)
> {code}
> Obviously, the last result set contains corrupt entries (line 2) and 
> incorrect entries (line 1). I also encountered this when doing this query 
> with JDBC. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)